Agent Chat - Architecture

System Design, Components, and Integration Patterns

Overview

Agent Chat implements a microservices-based architecture with real-time WebSocket communication, distributed state management, and deep integration with the LLM Platform ecosystem. The system is designed for enterprise scalability, observability, and LibreChat API compatibility.

System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         Client Layer                            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │  React UI    │  │  WebSocket   │  │  Claude Desktop MCP  │  │
│  │  (Browser)   │  │  Client      │  │  Integration         │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      API Gateway Layer                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │  Express     │  │  Socket.IO   │  │  Apollo GraphQL      │  │
│  │  REST API    │  │  WebSocket   │  │  Server              │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐
│   Service    │  │   Agent OS   │  │  LibreChat Enhanced  │
│   Layer      │  │   Layer      │  │  Integration         │
└──────────────┘  └──────────────┘  └──────────────────────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌��────────────────────────────────────────────────────────────────┐
│                    Integration Layer                            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │  LLM Gateway │  │  Agent Mesh  │  │  Knowledge Graph     │  │
│  │  (Routing)   │  │  Coordinator │  │  (Neo4j)             │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐
│  PostgreSQL  │  │    Redis     │  │      Qdrant          │
│  (Persistent)│  │  (Sessions)  │  │  (Vector Search)     │
└──────────────┘  └──────────────┘  └──────────────────────┘

Core Components

1. HTTP/REST API Server (Express)

File: src/server.js

Responsibilities: - HTTP request handling and routing - Middleware stack (security, CORS, compression) - Health checks and API info endpoints - Integration with ecosystem services

Key Features: - Helmet security headers - CORS configuration for multi-origin support - Compression for response optimization - Request logging with Winston - Graceful shutdown handling

Endpoints:

GET  /health           // Service health check
GET  /api/info         // API information and features
POST /api/chat         // Chat completion
POST /api/search       // Vector search
POST /api/agents       // Agent orchestration
GET  /metrics          // Prometheus metrics

2. WebSocket Layer (Socket.IO)

File: src/websocket/index.js

Responsibilities: - Real-time bidirectional communication - Streaming LLM responses - Live collaboration features - Voice assistant integration - Real-time progress broadcasts

Event Types:

// Client → Server
'chat_message'          // User sends message
'voice_command'         // Voice input from Echo
'agent_action'          // Agent task request
'subscription'          // Subscribe to updates

// Server → Client
'chat_response'         // AI response stream
'voice_feedback'        // Voice assistant reply
'agent_progress'        // Real-time task updates
'collaboration'         // Multi-user events

Features: - Connection pooling and management - Automatic reconnection - Room-based broadcasts - Message queuing and delivery guarantees - CORS and origin validation

3. GraphQL API Server (Apollo)

File: src/api/server.ts

Responsibilities: - Structured data queries and mutations - Real-time subscriptions - Type-safe API schema - Federation with other microservices

Schema Highlights:

type Query {
  conversations(limit: Int, offset: Int): [Conversation!]!
  conversation(id: ID!): Conversation
  models: [ModelInfo!]!
  searchConversations(query: String!): [Conversation!]!
  agentHealth(agentId: ID!): AgentHealth!
}

type Mutation {
  sendMessage(input: MessageInput!): Message!
  createConversation(title: String, model: String): Conversation!
  deleteConversation(id: ID!): Boolean!
  updateUserPreferences(input: PreferencesInput!): User!
}

type Subscription {
  messageStream(conversationId: ID!): Message!
  agentProgress(agentId: ID!): ProgressUpdate!
  collaborationEvent(roomId: ID!): CollaborationEvent!
}

State Management

5-Layer Enterprise Memory System

Architecture: src/agent-os/memory/EnterpriseMemoryPlugin.ts

Layer 1: Conversation Memory (Redis)

Purpose: Short-term context for active sessions
TTL: 1 hour (configurable)
Storage: Serialized conversation turns
Access Pattern: Read-heavy, write-on-message

interface ConversationMemory {
  sessionId: string;
  messages: Message[];
  turnCount: number;
  activeContext: Record<string, any>;
  lastActivity: Date;
}

Layer 2: User Memory (PostgreSQL)

Purpose: User preferences, history, satisfaction scores
Persistence: Permanent
Storage: Relational tables with indexes
Access Pattern: Read on session start, write on feedback

interface UserMemory {
  userId: string;
  preferences: ModelPreferences;
  satisfactionScore: number;
  totalConversations: number;
  topics: string[];
  learningHistory: LearningEvent[];
}

Layer 3: Knowledge Memory (Qdrant)

Purpose: Semantic search and RAG
Vectors: 384-dimensional embeddings
Storage: HNSW index for fast similarity
Access Pattern: Query on every message for context

interface KnowledgeMemory {
  vectors: Float32Array;
  metadata: {
    source: string;
    timestamp: Date;
    relevanceScore: number;
  };
  facts: string[];
  confidence: number;
}

Layer 4: Learning Memory (PostgreSQL)

Purpose: Continuous improvement from feedback
Storage: Interaction corrections, patterns, improvements
Processing: Async background jobs
Access Pattern: Write-heavy, periodic batch reads

interface LearningMemory {
  interactions: Interaction[];
  patterns: Pattern[];
  improvements: Improvement[];
  successMetrics: {
    accuracy: number;
    satisfactionGain: number;
  };
}

Layer 5: Performance Memory (Prometheus)

Purpose: System metrics and health
Retention: 30 days
Metrics: Latency, throughput, error rates, token usage
Access Pattern: Continuous time-series writes

interface PerformanceMemory {
  latency: Histogram;
  throughput: Counter;
  errorRate: Gauge;
  tokenUsage: Counter;
  modelDistribution: Summary;
}

Agent OS Integration

VORTEX v3 Token Optimization

Implementation: src/agent-os/core/AgentLauncherEndpoint.ts

Mechanism: 1. Context Compression: Semantic summarization of conversation history 2. Smart Truncation: Remove low-relevance turns while preserving context 3. Template Optimization: Reusable prompt templates 4. Caching: Redis cache for repeated queries

Performance: - 30-50% token reduction - <100ms overhead for compression - Maintains context quality (95%+ retention)

Continuous Learning System

Flow:

User Interaction
      ↓
Store in Layer 1-3 (immediate)
      ↓
Satisfaction Score < 0.7?
      ↓ Yes
Pattern Recognition (Layer 4)
      ↓
Generate Improvements
      ↓
Apply to Agent Configuration
      ↓
Measure Impact (Layer 5)
      ↓
Feedback Loop

Metrics: - Learning cycle: <2s - Pattern recognition: 85%+ accuracy - Improvement application: Real-time - Expected improvement: 15-25% satisfaction gain

Agent Deployment

Specification: OSSA 1.0 compliant

interface AgentSpecification {
  id: string;
  name: string;
  type: 'worker' | 'coordinator' | 'specialist';
  capabilities: string[];
  skills: Skill[];
  tools: Tool[];
  configuration: ModelConfig;
  resources: ResourceRequirements;
  scaling: ScalingPolicy;
  memory: MemoryConfig;
}

Deployment Targets: - Local (development) - BAR (Build-Attest-Run for production) - Kubernetes (distributed) - OrbStack (macOS local containers)

Service Integration

LLM Gateway Integration

Purpose: Multi-model routing and cost optimization

Connection: HTTP REST client - Base URL: process.env.LLM_GATEWAY_URL (default: http://localhost:4000) - Authentication: JWT tokens - Retry policy: Exponential backoff

Features: - Automatic model selection based on task - Fallback routing on failures - Cost tracking per request - A/B testing support

Vector Hub Integration

Purpose: Semantic search and RAG

Connection: Qdrant HTTP client - Base URL: process.env.VECTOR_HUB_URL (default: http://localhost:6333) - Collections: Conversation history, knowledge base, user profiles

Operations: - Embedding generation (via LLM Gateway) - Similarity search (HNSW index) - Upsert on new messages - Hybrid search (vector + keyword)

Agent Mesh Coordinator

Purpose: Distributed agent coordination

Connection: gRPC/HTTP hybrid - Base URL: process.env.AGENT_MESH_URL (default: http://localhost:3005) - Protocol: OSSA 1.0

Capabilities: - Agent discovery and registration - Load balancing across instances - Health monitoring - Task routing

Drupal Consumer

Purpose: SSO and user synchronization

Connection: REST API client - Base URL: process.env.DRUPAL_API_URL - Authentication: OAuth2 token exchange

Features: - Single Sign-On (SSO) - User role mapping - Content API access - Webhook notifications

Observability

Phoenix Arize Tracing

Configuration: src/infrastructure/phoenix-config.ts

Instrumentation: - LLM request/response traces - Embedding generation tracking - Agent interaction spans - Model comparison metrics

Endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT (default: http://localhost:6006)

Prometheus Metrics

Exported Metrics:

# Counters
agent_chat_requests_total{method, status, model}
agent_chat_tokens_used{model, type}
agent_chat_errors_total{type, model}

# Gauges
agent_chat_active_sessions
agent_chat_active_users
agent_chat_memory_usage_bytes

# Histograms
agent_chat_latency_seconds{endpoint, model}
agent_chat_token_reduction_ratio
agent_chat_satisfaction_score

Endpoint: http://localhost:9090/metrics

Structured Logging

Library: Winston 3.x

Transports: - Console (development) - File rotation (production) - Loki push (optional)

Format: JSON with timestamp, level, message, context

Security

Authentication

Methods: - JWT tokens (stateless) - Session cookies (Redis-backed) - Drupal SSO (OAuth2) - API keys (for service-to-service)

Authorization

Model: Role-based access control (RBAC)

Roles: - admin: Full access - user: Standard chat access - service: API-only access - guest: Read-only (limited)

Rate Limiting

Implementation: express-rate-limit

Limits: - REST API: 100 req/min per IP - WebSocket: 50 messages/min per session - GraphQL: 200 queries/min per user

Data Security

Passwords: bcrypt (12 rounds)
Tokens: JWT with RS256
Encryption: TLS 1.3 in transit
Secrets: Environment variables (never committed)

Scalability

Horizontal Scaling

Stateless Design: - Session state in Redis (shared) - No local file storage - Database connection pooling

Load Balancing: - Nginx reverse proxy - Kubernetes service mesh - WebSocket sticky sessions

Performance Optimization

Caching Strategy: - Redis for hot data (sessions, recent conversations) - PostgreSQL for warm data (user profiles) - Qdrant for semantic search

Database Optimization: - Connection pooling (max 20 per instance) - Prepared statements - Indexes on frequent queries - Periodic vacuum and analyze

Monitoring Thresholds

Alerts: - Latency p99 > 2s - Error rate > 5% - Memory usage > 80% - WebSocket connections > 4000 per instance

Related Pages: - API Reference - Endpoint documentation - Integration Guide - Integration patterns - Development - Local development setup

Last Updated: 2025-11-02