System Architecture Overview
Complete architectural overview of the LLM Platform ecosystem
High-Level Architecture
The LLM Platform is an enterprise-grade AI orchestration system built on a distributed microservices architecture, providing unified management for multiple AI providers, vector databases, and workflow automation.
graph TB
subgraph "Frontend Layer"
UI[Drupal 11 UI]
CLI[BuildKit CLI]
IDE[AgentStudio IDE]
end
subgraph "API Gateway Layer"
Gateway[LLM Gateway<br/>Multi-Provider Routing]
MCP[MCP Registry<br/>Model Context Protocol]
API[API Normalizer<br/>Standardization Layer]
end
subgraph "Orchestration Layer"
BuildKit[Agent BuildKit<br/>BAR Runtime]
Mesh[Agent Mesh<br/>Coordination Layer]
Orchestra[AI Agent Orchestra<br/>Workflow Engine]
end
subgraph "Agent Layer"
Workers[Worker Agents<br/>TDD, API Builder, Doc Sync]
Governors[Governor Agents<br/>Version Sync, Branch Policy]
Critics[Critic Agents<br/>Security, Performance, Quality]
Observers[Observer Agents<br/>Metrics, Roadmap, Health]
end
subgraph "AI/ML Layer"
Anthropic[Anthropic Claude]
OpenAI[OpenAI GPT]
Ollama[Ollama Local Models]
HuggingFace[HuggingFace Models]
end
subgraph "Data Layer"
Postgres[(PostgreSQL<br/>Relational Data)]
Redis[(Redis<br/>Cache & Sessions)]
Qdrant[(Qdrant<br/>Vector DB)]
MongoDB[(MongoDB<br/>Document Store)]
Neo4j[(Neo4j<br/>Knowledge Graph)]
end
subgraph "Observability Layer"
Tracer[Agent Tracer<br/>AI Ops Intelligence]
Phoenix[Phoenix Arize<br/>LLM Tracing]
Prometheus[Prometheus<br/>Metrics]
Grafana[Grafana<br/>Dashboards]
Jaeger[Jaeger<br/>Distributed Tracing]
end
UI --> Gateway
CLI --> BuildKit
IDE --> Mesh
Gateway --> Orchestra
MCP --> Mesh
API --> Gateway
Orchestra --> Mesh
BuildKit --> Workers
BuildKit --> Governors
BuildKit --> Critics
BuildKit --> Observers
Mesh --> Workers
Mesh --> Governors
Mesh --> Critics
Mesh --> Observers
Workers --> Anthropic
Workers --> OpenAI
Workers --> Ollama
Workers --> HuggingFace
Orchestra --> Postgres
Orchestra --> Redis
Orchestra --> Qdrant
Orchestra --> MongoDB
Orchestra --> Neo4j
Mesh --> Tracer
Workers --> Phoenix
Tracer --> Prometheus
Prometheus --> Grafana
Tracer --> Jaeger
Core Architectural Principles
1. Separation of Concerns
- Frontend Layer: User interfaces (Drupal, CLI, IDE)
- API Gateway Layer: Request routing and normalization
- Orchestration Layer: Workflow coordination and agent management
- Agent Layer: Autonomous task execution
- Data Layer: Persistent storage and caching
- Observability Layer: Monitoring and analytics
2. Distributed by Design
- Microservices architecture with clear service boundaries
- gRPC for high-performance inter-service communication
- REST APIs for human-friendly interfaces
- Event-driven architecture using message queues
3. OSSA Compliance
- Open Standards for Scalable Agents (OSSA 1.0)
- Standardized agent manifests and capabilities
- Cross-platform agent interoperability
- Protocol-based communication
4. Observability First
- Distributed tracing for all operations
- Real-time metrics collection
- Structured logging with correlation
- AI-specific observability (Phoenix Arize)
Component Overview
Frontend Layer
Drupal 11 Platform
- Purpose: Enterprise CMS and administrative UI
- Technology: PHP 8.3+, Symfony components
- Features:
- Multi-site management
- Content workflow automation
- User authentication and RBAC
- Custom modules for AI integration
- URL:
https://llm-platform.ddev.site
BuildKit CLI
- Purpose: Command-line orchestration and automation
- Technology: TypeScript, Node.js 20+
- Features:
- Agent lifecycle management
- Workflow orchestration (ROE)
- Real-time streaming (VORTEX v3)
- Quality testing (QITS)
- Resource management (SWARM)
- Command:
buildkit --help
AgentStudio IDE
- Purpose: Multi-platform development environment
- Technology: VSCode extension, web-based UI
- Features:
- AI-assisted coding
- Agent integration
- Real-time collaboration
- Test-driven development
API Gateway Layer
LLM Gateway
- Purpose: Unified multi-provider AI routing
- Technology: Node.js, Express, TypeScript
- Providers: Anthropic, OpenAI, Google, Cohere, local models
- Features:
- Intelligent routing and failover
- Cost optimization
- Rate limiting
- Request/response caching
- Port:
4000 - Endpoint:
http://localhost:4000/api/v1
MCP Registry
- Purpose: Model Context Protocol service management
- Technology: TypeScript, JSON-RPC 2.0
- Features:
- Tool registration and discovery
- Context sharing across agents
- Protocol compliance validation
- Version negotiation
- Specification: MCP Protocol v1.0
API Normalizer
- Purpose: Standardize requests/responses across providers
- Technology: Drupal module, PHP
- Features:
- Provider-agnostic interfaces
- Schema validation
- Response transformation
- Error normalization
Orchestration Layer
Agent BuildKit (BAR Runtime)
- Purpose: Enterprise autonomous agent platform
- Technology: TypeScript, Kubernetes, Helm
- Components:
- ROE (Runtime Orchestration Engine): Multi-agent coordination
- VORTEX v3: Real-time streaming and vector operations
- QITS: AI-powered quality intelligence testing
- SWARM: Dynamic resource management and scaling
- Features:
- Sequential thinking workflow (8 stages)
- GitLab CE integration
- 25+ OSSA-compliant agents
- API-first architecture with 30+ endpoints
Agent Mesh
- Purpose: Backend coordination layer for distributed agents
- Technology: gRPC, Protocol Buffers, WebSocket
- Features:
- Agent-to-agent communication
- Load balancing and routing
- Health monitoring and failover
- Circuit breaking
- mTLS security
- Ports:
3005(REST),50051(gRPC)
AI Agent Orchestra
- Purpose: Workflow automation and agent coordination
- Technology: Drupal module, PHP, Temporal
- Features:
- Workflow definition and execution
- Agent task assignment
- Dependency resolution
- State management
Agent Layer
Worker Agents
Execution-focused agents that perform specific tasks: - TDD Enforcer: Ensures test-driven development practices - API Builder: Generates API implementations from OpenAPI specs - Doc Synchronizer: Syncs documentation to GitLab Wiki - Code Executor: Secure sandboxed code execution - Test Generator: AI-powered test case generation
Governor Agents
Policy enforcement and compliance agents: - Version Sync: Maintains version consistency - Branch Policy: Enforces git branching strategies - OSSA Compliance Monitor: Validates OSSA adherence - Security Policy: Enforces security standards
Critic Agents
Analysis and review agents: - Security Auditor: Vulnerability scanning and analysis - Performance Monitor: Performance regression detection - Code Reviewer: Automated code quality review - Quality Gate Enforcer: Ensures quality thresholds
Observer Agents
Monitoring and analytics agents: - Roadmap Tracker: Monitors project progress - Metrics Collector: Aggregates performance metrics - System Monitor: Infrastructure health monitoring - Network Health Checker: Agent mesh connectivity
AI/ML Layer
Anthropic Claude
- Models: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- Use Cases: Code generation, analysis, reasoning
- Integration: Via LLM Gateway
OpenAI
- Models: GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
- Use Cases: Text generation, embeddings, function calling
- Integration: Via LLM Gateway
Ollama (Local Models)
- Models: Llama 3, Mistral, CodeLlama
- Use Cases: Development, privacy-sensitive tasks
- Integration: Direct API or via LLM Gateway
HuggingFace
- Models: Custom fine-tuned models
- Use Cases: Domain-specific tasks
- Integration: Drupal module + Python backend
Data Layer
PostgreSQL
- Purpose: Primary relational database
- Version: 15+
- Databases:
llm_platform: Drupal core dataagent_brain: Agent state and knowledgeagent_ops: Operational metricsworkflow_engine: Workflow definitionscompliance_engine: Compliance tracking- Port:
5432
Redis
- Purpose: Caching, sessions, message broker
- Version: 7+
- Use Cases:
- Session storage
- API response caching
- Rate limiting
- Task queues
- Port:
6379
Qdrant
- Purpose: Vector database for embeddings
- Version: Latest
- Features:
- Similarity search
- Semantic retrieval
- Document embeddings
- Multi-vector support
- Ports:
6333(HTTP),6334(gRPC)
MongoDB
- Purpose: Document storage for unstructured data
- Use Cases:
- Log aggregation
- Event sourcing
- Configuration management
- Port:
27017
Neo4j
- Purpose: Knowledge graph and correlation analysis
- Version: Latest
- Use Cases:
- Agent relationship mapping
- Root cause analysis
- Dependency tracking
- Ports:
7474(HTTP),7687(Bolt)
Observability Layer
Agent Tracer
- Purpose: AI Ops intelligence and unified observability
- Technology: TypeScript, OpenTelemetry
- Components:
- ACE (AI Capabilities Engine): Performance scoring
- ATLAS (Agent Tracing & Learning Analytics): Learning analytics
- Correlation Engine: Neo4j-based correlation
- Ports:
3007(API),3008(ACE),3009(ATLAS)
Phoenix Arize
- Purpose: AI-specific observability and LLM tracing
- Features:
- LLM call tracking
- Token usage monitoring
- Cost tracking
- Prompt analysis
- Port:
6006
Prometheus
- Purpose: Time-series metrics collection
- Features:
- Multi-dimensional metrics
- PromQL query language
- Alerting rules
- Service discovery
- Port:
9090
Grafana
- Purpose: Visualization and dashboards
- Features:
- Pre-built dashboards
- Custom visualizations
- Alerting
- Data source federation
- Port:
3000
Jaeger
- Purpose: Distributed tracing
- Features:
- Trace visualization
- Dependency graphs
- Performance analysis
- Service topology
- Port:
16686
Network Architecture
Service Communication Patterns
graph LR
subgraph "External Access"
User[End Users]
Dev[Developers]
end
subgraph "Ingress Layer"
Nginx[Nginx Ingress]
LB[Load Balancer]
end
subgraph "Service Mesh"
Gateway[API Gateway]
Mesh[Agent Mesh gRPC]
Services[Microservices]
end
subgraph "Backend Services"
Data[Data Services]
AI[AI Services]
Obs[Observability]
end
User --> Nginx
Dev --> LB
Nginx --> Gateway
LB --> Mesh
Gateway --> Services
Mesh --> Services
Services --> Data
Services --> AI
Services --> Obs
Port Allocation
| Service | HTTP Port | gRPC Port | Purpose |
|---|---|---|---|
| Drupal Platform | 443 (HTTPS) | - | Web UI |
| LLM Gateway | 4000 | - | AI routing |
| Agent Mesh | 3005 | 50051 | Agent coordination |
| Agent Tracer | 3007 | - | Observability |
| ACE | 3008 | - | Capability scoring |
| ATLAS | 3009 | - | Analytics |
| PostgreSQL | 5432 | - | Database |
| Redis | 6379 | - | Cache |
| Qdrant | 6333 | 6334 | Vector DB |
| Phoenix Arize | 6006 | 4317 (OTLP) | LLM tracing |
| Prometheus | 9090 | - | Metrics |
| Grafana | 3000 | - | Dashboards |
| Jaeger | 16686 | - | Tracing UI |
| MongoDB | 27017 | - | Document store |
| Neo4j | 7474 | 7687 | Graph DB |
Deployment Topologies
Development (DDEV)
- Single machine deployment
- Docker Compose orchestration
- Local DNS (
.ddev.site) - Hot reload enabled
- Debug tools accessible
Staging (Kubernetes)
- Multi-node cluster
- Helm chart deployment
- Namespaced environments
- Auto-scaling enabled
- Monitoring configured
Production (Kubernetes)
- High-availability cluster (3+ nodes)
- Multi-region deployment
- Auto-scaling and self-healing
- Full observability stack
- Disaster recovery configured
Security Architecture
Authentication & Authorization
- JWT Tokens: API authentication
- OAuth 2.0: Third-party integration
- RBAC: Role-based access control
- mTLS: Service-to-service encryption
Network Security
- Network Policies: K8s network isolation
- Ingress TLS: HTTPS termination
- Service Mesh: mTLS between services
- Firewall Rules: Port-based filtering
Data Security
- Encryption at Rest: Database encryption
- Encryption in Transit: TLS 1.3
- Secret Management: Kubernetes Secrets
- Credential Rotation: Automated rotation
Scalability & Performance
Horizontal Scaling
- Stateless Services: Scale to N replicas
- Load Balancing: Round-robin + weighted
- Auto-scaling: CPU/memory-based HPA
- Agent Pooling: Dynamic agent allocation
Vertical Scaling
- Resource Limits: Per-service limits
- Resource Requests: Guaranteed resources
- QoS Classes: Guaranteed, Burstable, BestEffort
Performance Optimization
- Caching: Multi-layer caching (Redis, CDN)
- Connection Pooling: Database connections
- Request Batching: Batch AI requests
- Compression: Response compression
Related Documentation
- BuildKit Architecture - Detailed BAR runtime architecture
- Agent Mesh Architecture - Coordination layer details
- LLM Gateway - Multi-provider routing
- MCP Registry - Model Context Protocol
- Vector Database - Qdrant integration
- Agent Tracer - Observability platform
- Kubernetes Deployment
- DDEV Development
Next Steps
- For Developers: Start with DDEV Development
- For Operations: Review Kubernetes Setup
- For Architects: Dive into BuildKit Architecture
- For Monitoring: Explore Agent Tracer