Agent Router Developer Guide
Overview
Package: @bluefly/agent-router
Version: Latest
License: GPL-2.0+
Unified LLM Gateway with multi-provider routing, LibreChat integration, and GraphQL federation. Production-grade API gateway for AI services.
Key Features
- Unified LLM API: Single endpoint for OpenAI, Anthropic, HuggingFace, Ollama
- GraphQL Federation: Federated graph across microservices
- Intelligent Routing: Cost/latency/capability-based model selection
- Authentication & Authorization: JWT, OAuth2, API keys, RBAC
- Rate Limiting & Caching: Redis-backed rate limiter and LRU cache
- Observability: Phoenix Arize, OpenTelemetry, Prometheus, Jaeger
- MCP Server: Model Context Protocol server
- LibreChat Integration: SSO with Drupal, custom endpoints
Installation
npm install @bluefly/agent-router
Quick Start
Unified Fastify Server
# Development mode
npm run dev:unified
# Production mode
npm run build
npm run start:unified
# Access services
# REST API: http://localhost:3000/api/v1
# GraphQL: http://localhost:3000/graphql
# Health: http://localhost:3000/health
# Metrics: http://localhost:3000/metrics
Basic Usage
import { AgentRouter } from '@bluefly/agent-router';
const router = new AgentRouter({
providers: {
anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
openai: { apiKey: process.env.OPENAI_API_KEY },
},
routing: {
strategy: 'cost-optimized', // or 'latency-optimized', 'capability-matched'
fallback: true,
},
});
// Make LLM request
const response = await router.complete({
prompt: 'Explain quantum computing',
maxTokens: 1000,
temperature: 0.7,
// Provider auto-selected based on routing strategy
});
API Reference
Unified LLM API
POST /api/v1/completions
Create completion.
Request:
{
"prompt": "Explain quantum computing",
"model": "auto",
"maxTokens": 1000,
"temperature": 0.7
}
Response:
{
"id": "cmpl-123",
"model": "claude-3-5-sonnet-20241022",
"choices": [{
"text": "Quantum computing...",
"finishReason": "stop"
}],
"usage": {
"promptTokens": 10,
"completionTokens": 150,
"totalTokens": 160
},
"cost": 0.0024
}
GraphQL Federation
# Query across federated services
query GetAgentWithContext {
agent(id: "agent-001") {
id
name
capabilities
# Federated from agent-brain
knowledgeGraph {
nodes
relationships
}
# Federated from agent-tracer
traces {
spans
duration
status
}
}
}
Routing Strategies
// Cost optimization
const router = new AgentRouter({
routing: {
strategy: 'cost-optimized',
preferredProviders: ['anthropic', 'openai'],
maxCostPerRequest: 0.01
}
});
// Latency optimization
const router = new AgentRouter({
routing: {
strategy: 'latency-optimized',
maxLatency: 2000, // ms
enableParallelRequests: true
}
});
// Capability matching
const router = new AgentRouter({
routing: {
strategy: 'capability-matched',
requiredCapabilities: ['function-calling', 'vision']
}
});
Configuration
Environment Variables
# AI Providers
ANTHROPIC_API_KEY=~/.tokens/anthropic
OPENAI_API_KEY=~/.tokens/openai
HUGGINGFACE_API_KEY=~/.tokens/huggingface
# Databases
DATABASE_URL=postgresql://user:pass@localhost:5432/llm_platform
REDIS_URL=redis://localhost:6379
MONGODB_URL=mongodb://localhost:27017/audit_logs
# Authentication
JWT_SECRET=your-jwt-secret
JWT_EXPIRATION=24h
OAUTH2_CLIENT_ID=drupal-client-id
# Observability
PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006
JAEGER_ENDPOINT=http://localhost:14268/api/traces
PROMETHEUS_PORT=9090
# Features
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
ENABLE_GRAPHQL=true
ENABLE_MCP=true
Examples
Caching Layer
// Enable request caching
const router = new AgentRouter({
caching: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'redis',
keyPrefix: 'llm:'
}
});
// Cached request
const response = await router.complete({
prompt: 'What is the capital of France?',
cache: true
});
Rate Limiting
// Configure rate limiting
const router = new AgentRouter({
rateLimiting: {
enabled: true,
windowMs: 60000, // 1 minute
maxRequests: 100,
perUser: true
}
});
MCP Server
import { MCPServer } from '@bluefly/agent-router/mcp';
const mcp = new MCPServer({
tools: [
{
name: 'complete_llm',
description: 'Complete text using LLM',
schema: { /* JSON Schema */ },
handler: async (params) => {
return await router.complete(params);
},
},
],
});
await mcp.start();
Observability
Phoenix Arize
# View AI-specific traces
open http://localhost:6006
# Monitor LLM performance
curl http://localhost:3000/api/v1/observability/llm
Prometheus Metrics
# Export metrics
curl http://localhost:9090/metrics
# Key metrics:
# - llm_requests_total
# - llm_request_duration_seconds
# - llm_tokens_used_total
# - provider_availability
# - cache_hit_rate
Testing
npm test
npm run test:integration
npm run test:e2e
npm run test:performance
Deployment
Kubernetes
kubectl apply -f infrastructure/kubernetes/production/
kubectl get pods -n agent-router
Docker
npm run docker:build
npm run docker:compose
Performance
- Throughput: 10,000+ requests/second with caching
- Latency: <50ms p99 (excluding LLM provider time)
- Availability: 99.99% uptime with multi-provider failover
- Cache Hit Rate: 40-60% for typical workloads
- Token Cost Savings: 30-50% with optimal routing
Related Packages
- @bluefly/agent-protocol - MCP protocol
- @bluefly/agent-mesh - Agent coordination
- @bluefly/agent-brain - Knowledge graph
Documentation
- GitLab: https://gitlab.bluefly.io/llm/npm/agent-router
- OpenAPI Spec: openapi.yaml