Agent Chat - Architecture
System Design, Components, and Integration Patterns
Overview
Agent Chat implements a microservices-based architecture with real-time WebSocket communication, distributed state management, and deep integration with the LLM Platform ecosystem. The system is designed for enterprise scalability, observability, and LibreChat API compatibility.
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ React UI │ │ WebSocket │ │ Claude Desktop MCP │ │
│ │ (Browser) │ │ Client │ │ Integration │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ API Gateway Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Express │ │ Socket.IO │ │ Apollo GraphQL │ │
│ │ REST API │ │ WebSocket │ │ Server │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐
│ Service │ │ Agent OS │ │ LibreChat Enhanced │
│ Layer │ │ Layer │ │ Integration │
└──────────────┘ └──────────────┘ └──────────────────────┘
│ │ │
└───────────────────┼───────────────────┘
▼
┌��────────────────────────────────────────────────────────────────┐
│ Integration Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ LLM Gateway │ │ Agent Mesh │ │ Knowledge Graph │ │
│ │ (Routing) │ │ Coordinator │ │ (Neo4j) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐
│ PostgreSQL │ │ Redis │ │ Qdrant │
│ (Persistent)│ │ (Sessions) │ │ (Vector Search) │
└──────────────┘ └──────────────┘ └──────────────────────┘
Core Components
1. HTTP/REST API Server (Express)
File: src/server.js
Responsibilities: - HTTP request handling and routing - Middleware stack (security, CORS, compression) - Health checks and API info endpoints - Integration with ecosystem services
Key Features: - Helmet security headers - CORS configuration for multi-origin support - Compression for response optimization - Request logging with Winston - Graceful shutdown handling
Endpoints:
GET /health // Service health check
GET /api/info // API information and features
POST /api/chat // Chat completion
POST /api/search // Vector search
POST /api/agents // Agent orchestration
GET /metrics // Prometheus metrics
2. WebSocket Layer (Socket.IO)
File: src/websocket/index.js
Responsibilities: - Real-time bidirectional communication - Streaming LLM responses - Live collaboration features - Voice assistant integration - Real-time progress broadcasts
Event Types:
// Client → Server
'chat_message' // User sends message
'voice_command' // Voice input from Echo
'agent_action' // Agent task request
'subscription' // Subscribe to updates
// Server → Client
'chat_response' // AI response stream
'voice_feedback' // Voice assistant reply
'agent_progress' // Real-time task updates
'collaboration' // Multi-user events
Features: - Connection pooling and management - Automatic reconnection - Room-based broadcasts - Message queuing and delivery guarantees - CORS and origin validation
3. GraphQL API Server (Apollo)
File: src/api/server.ts
Responsibilities: - Structured data queries and mutations - Real-time subscriptions - Type-safe API schema - Federation with other microservices
Schema Highlights:
type Query {
conversations(limit: Int, offset: Int): [Conversation!]!
conversation(id: ID!): Conversation
models: [ModelInfo!]!
searchConversations(query: String!): [Conversation!]!
agentHealth(agentId: ID!): AgentHealth!
}
type Mutation {
sendMessage(input: MessageInput!): Message!
createConversation(title: String, model: String): Conversation!
deleteConversation(id: ID!): Boolean!
updateUserPreferences(input: PreferencesInput!): User!
}
type Subscription {
messageStream(conversationId: ID!): Message!
agentProgress(agentId: ID!): ProgressUpdate!
collaborationEvent(roomId: ID!): CollaborationEvent!
}
State Management
5-Layer Enterprise Memory System
Architecture: src/agent-os/memory/EnterpriseMemoryPlugin.ts
Layer 1: Conversation Memory (Redis)
- Purpose: Short-term context for active sessions
- TTL: 1 hour (configurable)
- Storage: Serialized conversation turns
- Access Pattern: Read-heavy, write-on-message
interface ConversationMemory {
sessionId: string;
messages: Message[];
turnCount: number;
activeContext: Record<string, any>;
lastActivity: Date;
}
Layer 2: User Memory (PostgreSQL)
- Purpose: User preferences, history, satisfaction scores
- Persistence: Permanent
- Storage: Relational tables with indexes
- Access Pattern: Read on session start, write on feedback
interface UserMemory {
userId: string;
preferences: ModelPreferences;
satisfactionScore: number;
totalConversations: number;
topics: string[];
learningHistory: LearningEvent[];
}
Layer 3: Knowledge Memory (Qdrant)
- Purpose: Semantic search and RAG
- Vectors: 384-dimensional embeddings
- Storage: HNSW index for fast similarity
- Access Pattern: Query on every message for context
interface KnowledgeMemory {
vectors: Float32Array;
metadata: {
source: string;
timestamp: Date;
relevanceScore: number;
};
facts: string[];
confidence: number;
}
Layer 4: Learning Memory (PostgreSQL)
- Purpose: Continuous improvement from feedback
- Storage: Interaction corrections, patterns, improvements
- Processing: Async background jobs
- Access Pattern: Write-heavy, periodic batch reads
interface LearningMemory {
interactions: Interaction[];
patterns: Pattern[];
improvements: Improvement[];
successMetrics: {
accuracy: number;
satisfactionGain: number;
};
}
Layer 5: Performance Memory (Prometheus)
- Purpose: System metrics and health
- Retention: 30 days
- Metrics: Latency, throughput, error rates, token usage
- Access Pattern: Continuous time-series writes
interface PerformanceMemory {
latency: Histogram;
throughput: Counter;
errorRate: Gauge;
tokenUsage: Counter;
modelDistribution: Summary;
}
Agent OS Integration
VORTEX v3 Token Optimization
Implementation: src/agent-os/core/AgentLauncherEndpoint.ts
Mechanism: 1. Context Compression: Semantic summarization of conversation history 2. Smart Truncation: Remove low-relevance turns while preserving context 3. Template Optimization: Reusable prompt templates 4. Caching: Redis cache for repeated queries
Performance: - 30-50% token reduction - <100ms overhead for compression - Maintains context quality (95%+ retention)
Continuous Learning System
Flow:
User Interaction
↓
Store in Layer 1-3 (immediate)
↓
Satisfaction Score < 0.7?
↓ Yes
Pattern Recognition (Layer 4)
↓
Generate Improvements
↓
Apply to Agent Configuration
↓
Measure Impact (Layer 5)
↓
Feedback Loop
Metrics: - Learning cycle: <2s - Pattern recognition: 85%+ accuracy - Improvement application: Real-time - Expected improvement: 15-25% satisfaction gain
Agent Deployment
Specification: OSSA 1.0 compliant
interface AgentSpecification {
id: string;
name: string;
type: 'worker' | 'coordinator' | 'specialist';
capabilities: string[];
skills: Skill[];
tools: Tool[];
configuration: ModelConfig;
resources: ResourceRequirements;
scaling: ScalingPolicy;
memory: MemoryConfig;
}
Deployment Targets: - Local (development) - BAR (Build-Attest-Run for production) - Kubernetes (distributed) - OrbStack (macOS local containers)
Service Integration
LLM Gateway Integration
Purpose: Multi-model routing and cost optimization
Connection: HTTP REST client
- Base URL: process.env.LLM_GATEWAY_URL (default: http://localhost:4000)
- Authentication: JWT tokens
- Retry policy: Exponential backoff
Features: - Automatic model selection based on task - Fallback routing on failures - Cost tracking per request - A/B testing support
Vector Hub Integration
Purpose: Semantic search and RAG
Connection: Qdrant HTTP client
- Base URL: process.env.VECTOR_HUB_URL (default: http://localhost:6333)
- Collections: Conversation history, knowledge base, user profiles
Operations: - Embedding generation (via LLM Gateway) - Similarity search (HNSW index) - Upsert on new messages - Hybrid search (vector + keyword)
Agent Mesh Coordinator
Purpose: Distributed agent coordination
Connection: gRPC/HTTP hybrid
- Base URL: process.env.AGENT_MESH_URL (default: http://localhost:3005)
- Protocol: OSSA 1.0
Capabilities: - Agent discovery and registration - Load balancing across instances - Health monitoring - Task routing
Drupal Consumer
Purpose: SSO and user synchronization
Connection: REST API client
- Base URL: process.env.DRUPAL_API_URL
- Authentication: OAuth2 token exchange
Features: - Single Sign-On (SSO) - User role mapping - Content API access - Webhook notifications
Observability
Phoenix Arize Tracing
Configuration: src/infrastructure/phoenix-config.ts
Instrumentation: - LLM request/response traces - Embedding generation tracking - Agent interaction spans - Model comparison metrics
Endpoint: process.env.PHOENIX_COLLECTOR_ENDPOINT (default: http://localhost:6006)
Prometheus Metrics
Exported Metrics:
# Counters
agent_chat_requests_total{method, status, model}
agent_chat_tokens_used{model, type}
agent_chat_errors_total{type, model}
# Gauges
agent_chat_active_sessions
agent_chat_active_users
agent_chat_memory_usage_bytes
# Histograms
agent_chat_latency_seconds{endpoint, model}
agent_chat_token_reduction_ratio
agent_chat_satisfaction_score
Endpoint: http://localhost:9090/metrics
Structured Logging
Library: Winston 3.x
Transports: - Console (development) - File rotation (production) - Loki push (optional)
Format: JSON with timestamp, level, message, context
Security
Authentication
Methods: - JWT tokens (stateless) - Session cookies (Redis-backed) - Drupal SSO (OAuth2) - API keys (for service-to-service)
Authorization
Model: Role-based access control (RBAC)
Roles:
- admin: Full access
- user: Standard chat access
- service: API-only access
- guest: Read-only (limited)
Rate Limiting
Implementation: express-rate-limit
Limits: - REST API: 100 req/min per IP - WebSocket: 50 messages/min per session - GraphQL: 200 queries/min per user
Data Security
- Passwords: bcrypt (12 rounds)
- Tokens: JWT with RS256
- Encryption: TLS 1.3 in transit
- Secrets: Environment variables (never committed)
Scalability
Horizontal Scaling
Stateless Design: - Session state in Redis (shared) - No local file storage - Database connection pooling
Load Balancing: - Nginx reverse proxy - Kubernetes service mesh - WebSocket sticky sessions
Performance Optimization
Caching Strategy: - Redis for hot data (sessions, recent conversations) - PostgreSQL for warm data (user profiles) - Qdrant for semantic search
Database Optimization: - Connection pooling (max 20 per instance) - Prepared statements - Indexes on frequent queries - Periodic vacuum and analyze
Monitoring Thresholds
Alerts: - Latency p99 > 2s - Error rate > 5% - Memory usage > 80% - WebSocket connections > 4000 per instance
Related Pages: - API Reference - Endpoint documentation - Integration Guide - Integration patterns - Development - Local development setup
Last Updated: 2025-11-02