System Architecture Overview

Complete architectural overview of the LLM Platform ecosystem

High-Level Architecture

The LLM Platform is an enterprise-grade AI orchestration system built on a distributed microservices architecture, providing unified management for multiple AI providers, vector databases, and workflow automation.

graph TB
    subgraph "Frontend Layer"
        UI[Drupal 11 UI]
        CLI[BuildKit CLI]
        IDE[AgentStudio IDE]
    end

    subgraph "API Gateway Layer"
        Gateway[LLM Gateway<br/>Multi-Provider Routing]
        MCP[MCP Registry<br/>Model Context Protocol]
        API[API Normalizer<br/>Standardization Layer]
    end

    subgraph "Orchestration Layer"
        BuildKit[Agent BuildKit<br/>BAR Runtime]
        Mesh[Agent Mesh<br/>Coordination Layer]
        Orchestra[AI Agent Orchestra<br/>Workflow Engine]
    end

    subgraph "Agent Layer"
        Workers[Worker Agents<br/>TDD, API Builder, Doc Sync]
        Governors[Governor Agents<br/>Version Sync, Branch Policy]
        Critics[Critic Agents<br/>Security, Performance, Quality]
        Observers[Observer Agents<br/>Metrics, Roadmap, Health]
    end

    subgraph "AI/ML Layer"
        Anthropic[Anthropic Claude]
        OpenAI[OpenAI GPT]
        Ollama[Ollama Local Models]
        HuggingFace[HuggingFace Models]
    end

    subgraph "Data Layer"
        Postgres[(PostgreSQL<br/>Relational Data)]
        Redis[(Redis<br/>Cache & Sessions)]
        Qdrant[(Qdrant<br/>Vector DB)]
        MongoDB[(MongoDB<br/>Document Store)]
        Neo4j[(Neo4j<br/>Knowledge Graph)]
    end

    subgraph "Observability Layer"
        Tracer[Agent Tracer<br/>AI Ops Intelligence]
        Phoenix[Phoenix Arize<br/>LLM Tracing]
        Prometheus[Prometheus<br/>Metrics]
        Grafana[Grafana<br/>Dashboards]
        Jaeger[Jaeger<br/>Distributed Tracing]
    end

    UI --> Gateway
    CLI --> BuildKit
    IDE --> Mesh

    Gateway --> Orchestra
    MCP --> Mesh
    API --> Gateway

    Orchestra --> Mesh
    BuildKit --> Workers
    BuildKit --> Governors
    BuildKit --> Critics
    BuildKit --> Observers

    Mesh --> Workers
    Mesh --> Governors
    Mesh --> Critics
    Mesh --> Observers

    Workers --> Anthropic
    Workers --> OpenAI
    Workers --> Ollama
    Workers --> HuggingFace

    Orchestra --> Postgres
    Orchestra --> Redis
    Orchestra --> Qdrant
    Orchestra --> MongoDB
    Orchestra --> Neo4j

    Mesh --> Tracer
    Workers --> Phoenix
    Tracer --> Prometheus
    Prometheus --> Grafana
    Tracer --> Jaeger

Core Architectural Principles

1. Separation of Concerns

Frontend Layer: User interfaces (Drupal, CLI, IDE)
API Gateway Layer: Request routing and normalization
Orchestration Layer: Workflow coordination and agent management
Agent Layer: Autonomous task execution
Data Layer: Persistent storage and caching
Observability Layer: Monitoring and analytics

2. Distributed by Design

Microservices architecture with clear service boundaries
gRPC for high-performance inter-service communication
REST APIs for human-friendly interfaces
Event-driven architecture using message queues

3. OSSA Compliance

Open Standards for Scalable Agents (OSSA 1.0)
Standardized agent manifests and capabilities
Cross-platform agent interoperability
Protocol-based communication

4. Observability First

Distributed tracing for all operations
Real-time metrics collection
Structured logging with correlation
AI-specific observability (Phoenix Arize)

Component Overview

Frontend Layer

Drupal 11 Platform

Purpose: Enterprise CMS and administrative UI
Technology: PHP 8.3+, Symfony components
Features:
Multi-site management
Content workflow automation
User authentication and RBAC
Custom modules for AI integration
URL: https://llm-platform.ddev.site

BuildKit CLI

Purpose: Command-line orchestration and automation
Technology: TypeScript, Node.js 20+
Features:
Agent lifecycle management
Workflow orchestration (ROE)
Real-time streaming (VORTEX v3)
Quality testing (QITS)
Resource management (SWARM)
Command: buildkit --help

AgentStudio IDE

Purpose: Multi-platform development environment
Technology: VSCode extension, web-based UI
Features:
AI-assisted coding
Agent integration
Real-time collaboration
Test-driven development

API Gateway Layer

LLM Gateway

Purpose: Unified multi-provider AI routing
Technology: Node.js, Express, TypeScript
Providers: Anthropic, OpenAI, Google, Cohere, local models
Features:
Intelligent routing and failover
Cost optimization
Rate limiting
Request/response caching
Port: 4000
Endpoint: http://localhost:4000/api/v1

MCP Registry

Purpose: Model Context Protocol service management
Technology: TypeScript, JSON-RPC 2.0
Features:
Tool registration and discovery
Context sharing across agents
Protocol compliance validation
Version negotiation
Specification: MCP Protocol v1.0

API Normalizer

Purpose: Standardize requests/responses across providers
Technology: Drupal module, PHP
Features:
Provider-agnostic interfaces
Schema validation
Response transformation
Error normalization

Orchestration Layer

Agent BuildKit (BAR Runtime)

Purpose: Enterprise autonomous agent platform
Technology: TypeScript, Kubernetes, Helm
Components:
ROE (Runtime Orchestration Engine): Multi-agent coordination
VORTEX v3: Real-time streaming and vector operations
QITS: AI-powered quality intelligence testing
SWARM: Dynamic resource management and scaling
Features:
Sequential thinking workflow (8 stages)
GitLab CE integration
25+ OSSA-compliant agents
API-first architecture with 30+ endpoints

Agent Mesh

Purpose: Backend coordination layer for distributed agents
Technology: gRPC, Protocol Buffers, WebSocket
Features:
Agent-to-agent communication
Load balancing and routing
Health monitoring and failover
Circuit breaking
mTLS security
Ports: 3005 (REST), 50051 (gRPC)

AI Agent Orchestra

Purpose: Workflow automation and agent coordination
Technology: Drupal module, PHP, Temporal
Features:
Workflow definition and execution
Agent task assignment
Dependency resolution
State management

Agent Layer

Worker Agents

Execution-focused agents that perform specific tasks: - TDD Enforcer: Ensures test-driven development practices - API Builder: Generates API implementations from OpenAPI specs - Doc Synchronizer: Syncs documentation to GitLab Wiki - Code Executor: Secure sandboxed code execution - Test Generator: AI-powered test case generation

Governor Agents

Policy enforcement and compliance agents: - Version Sync: Maintains version consistency - Branch Policy: Enforces git branching strategies - OSSA Compliance Monitor: Validates OSSA adherence - Security Policy: Enforces security standards

Critic Agents

Analysis and review agents: - Security Auditor: Vulnerability scanning and analysis - Performance Monitor: Performance regression detection - Code Reviewer: Automated code quality review - Quality Gate Enforcer: Ensures quality thresholds

Observer Agents

Monitoring and analytics agents: - Roadmap Tracker: Monitors project progress - Metrics Collector: Aggregates performance metrics - System Monitor: Infrastructure health monitoring - Network Health Checker: Agent mesh connectivity

AI/ML Layer

Anthropic Claude

Models: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
Use Cases: Code generation, analysis, reasoning
Integration: Via LLM Gateway

OpenAI

Models: GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
Use Cases: Text generation, embeddings, function calling
Integration: Via LLM Gateway

Ollama (Local Models)

Models: Llama 3, Mistral, CodeLlama
Use Cases: Development, privacy-sensitive tasks
Integration: Direct API or via LLM Gateway

HuggingFace

Models: Custom fine-tuned models
Use Cases: Domain-specific tasks
Integration: Drupal module + Python backend

Data Layer

PostgreSQL

Purpose: Primary relational database
Version: 15+
Databases:
llm_platform: Drupal core data
agent_brain: Agent state and knowledge
agent_ops: Operational metrics
workflow_engine: Workflow definitions
compliance_engine: Compliance tracking
Port: 5432

Redis

Purpose: Caching, sessions, message broker
Version: 7+
Use Cases:
Session storage
API response caching
Rate limiting
Task queues
Port: 6379

Qdrant

Purpose: Vector database for embeddings
Version: Latest
Features:
Similarity search
Semantic retrieval
Document embeddings
Multi-vector support
Ports: 6333 (HTTP), 6334 (gRPC)

MongoDB

Purpose: Document storage for unstructured data
Use Cases:
Log aggregation
Event sourcing
Configuration management
Port: 27017

Neo4j

Purpose: Knowledge graph and correlation analysis
Version: Latest
Use Cases:
Agent relationship mapping
Root cause analysis
Dependency tracking
Ports: 7474 (HTTP), 7687 (Bolt)

Observability Layer

Agent Tracer

Purpose: AI Ops intelligence and unified observability
Technology: TypeScript, OpenTelemetry
Components:
ACE (AI Capabilities Engine): Performance scoring
ATLAS (Agent Tracing & Learning Analytics): Learning analytics
Correlation Engine: Neo4j-based correlation
Ports: 3007 (API), 3008 (ACE), 3009 (ATLAS)

Phoenix Arize

Purpose: AI-specific observability and LLM tracing
Features:
LLM call tracking
Token usage monitoring
Cost tracking
Prompt analysis
Port: 6006

Prometheus

Purpose: Time-series metrics collection
Features:
Multi-dimensional metrics
PromQL query language
Alerting rules
Service discovery
Port: 9090

Grafana

Purpose: Visualization and dashboards
Features:
Pre-built dashboards
Custom visualizations
Alerting
Data source federation
Port: 3000

Jaeger

Purpose: Distributed tracing
Features:
Trace visualization
Dependency graphs
Performance analysis
Service topology
Port: 16686

Network Architecture

Service Communication Patterns

graph LR
    subgraph "External Access"
        User[End Users]
        Dev[Developers]
    end

    subgraph "Ingress Layer"
        Nginx[Nginx Ingress]
        LB[Load Balancer]
    end

    subgraph "Service Mesh"
        Gateway[API Gateway]
        Mesh[Agent Mesh gRPC]
        Services[Microservices]
    end

    subgraph "Backend Services"
        Data[Data Services]
        AI[AI Services]
        Obs[Observability]
    end

    User --> Nginx
    Dev --> LB
    Nginx --> Gateway
    LB --> Mesh
    Gateway --> Services
    Mesh --> Services
    Services --> Data
    Services --> AI
    Services --> Obs

Port Allocation

Service	HTTP Port	gRPC Port	Purpose
Drupal Platform	443 (HTTPS)	-	Web UI
LLM Gateway	4000	-	AI routing
Agent Mesh	3005	50051	Agent coordination
Agent Tracer	3007	-	Observability
ACE	3008	-	Capability scoring
ATLAS	3009	-	Analytics
PostgreSQL	5432	-	Database
Redis	6379	-	Cache
Qdrant	6333	6334	Vector DB
Phoenix Arize	6006	4317 (OTLP)	LLM tracing
Prometheus	9090	-	Metrics
Grafana	3000	-	Dashboards
Jaeger	16686	-	Tracing UI
MongoDB	27017	-	Document store
Neo4j	7474	7687	Graph DB

Deployment Topologies

Development (DDEV)

Single machine deployment
Docker Compose orchestration
Local DNS (.ddev.site)
Hot reload enabled
Debug tools accessible

Staging (Kubernetes)

Multi-node cluster
Helm chart deployment
Namespaced environments
Auto-scaling enabled
Monitoring configured

Production (Kubernetes)

High-availability cluster (3+ nodes)
Multi-region deployment
Auto-scaling and self-healing
Full observability stack
Disaster recovery configured

Security Architecture

Authentication & Authorization

JWT Tokens: API authentication
OAuth 2.0: Third-party integration
RBAC: Role-based access control
mTLS: Service-to-service encryption

Network Security

Network Policies: K8s network isolation
Ingress TLS: HTTPS termination
Service Mesh: mTLS between services
Firewall Rules: Port-based filtering

Data Security

Encryption at Rest: Database encryption
Encryption in Transit: TLS 1.3
Secret Management: Kubernetes Secrets
Credential Rotation: Automated rotation

Scalability & Performance

Horizontal Scaling

Stateless Services: Scale to N replicas
Load Balancing: Round-robin + weighted
Auto-scaling: CPU/memory-based HPA
Agent Pooling: Dynamic agent allocation

Vertical Scaling

Resource Limits: Per-service limits
Resource Requests: Guaranteed resources
QoS Classes: Guaranteed, Burstable, BestEffort

Performance Optimization

Caching: Multi-layer caching (Redis, CDN)
Connection Pooling: Database connections
Request Batching: Batch AI requests
Compression: Response compression

BuildKit Architecture - Detailed BAR runtime architecture
Agent Mesh Architecture - Coordination layer details
LLM Gateway - Multi-provider routing
MCP Registry - Model Context Protocol
Vector Database - Qdrant integration
Agent Tracer - Observability platform
Kubernetes Deployment
DDEV Development

Next Steps

For Developers: Start with DDEV Development
For Operations: Review Kubernetes Setup
For Architects: Dive into BuildKit Architecture
For Monitoring: Explore Agent Tracer