Agent Router Developer Guide

Overview

Package: @bluefly/agent-router Version: Latest License: GPL-2.0+

Unified LLM Gateway with multi-provider routing, LibreChat integration, and GraphQL federation. Production-grade API gateway for AI services.

Key Features

Unified LLM API: Single endpoint for OpenAI, Anthropic, HuggingFace, Ollama
GraphQL Federation: Federated graph across microservices
Intelligent Routing: Cost/latency/capability-based model selection
Authentication & Authorization: JWT, OAuth2, API keys, RBAC
Rate Limiting & Caching: Redis-backed rate limiter and LRU cache
Observability: Phoenix Arize, OpenTelemetry, Prometheus, Jaeger
MCP Server: Model Context Protocol server
LibreChat Integration: SSO with Drupal, custom endpoints

Installation

npm install @bluefly/agent-router

Quick Start

Unified Fastify Server

# Development mode
npm run dev:unified

# Production mode
npm run build
npm run start:unified

# Access services
# REST API: http://localhost:3000/api/v1
# GraphQL: http://localhost:3000/graphql
# Health: http://localhost:3000/health
# Metrics: http://localhost:3000/metrics

Basic Usage

import { AgentRouter } from '@bluefly/agent-router';

const router = new AgentRouter({
  providers: {
    anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
    openai: { apiKey: process.env.OPENAI_API_KEY },
  },
  routing: {
    strategy: 'cost-optimized', // or 'latency-optimized', 'capability-matched'
    fallback: true,
  },
});

// Make LLM request
const response = await router.complete({
  prompt: 'Explain quantum computing',
  maxTokens: 1000,
  temperature: 0.7,
  // Provider auto-selected based on routing strategy
});

API Reference

Unified LLM API

`POST /api/v1/completions`

Create completion.

Request:

{
  "prompt": "Explain quantum computing",
  "model": "auto",
  "maxTokens": 1000,
  "temperature": 0.7
}

Response:

{
  "id": "cmpl-123",
  "model": "claude-3-5-sonnet-20241022",
  "choices": [{
    "text": "Quantum computing...",
    "finishReason": "stop"
  }],
  "usage": {
    "promptTokens": 10,
    "completionTokens": 150,
    "totalTokens": 160
  },
  "cost": 0.0024
}

GraphQL Federation

# Query across federated services
query GetAgentWithContext {
  agent(id: "agent-001") {
    id
    name
    capabilities

    # Federated from agent-brain
    knowledgeGraph {
      nodes
      relationships
    }

    # Federated from agent-tracer
    traces {
      spans
      duration
      status
    }
  }
}

Routing Strategies

// Cost optimization
const router = new AgentRouter({
  routing: {
    strategy: 'cost-optimized',
    preferredProviders: ['anthropic', 'openai'],
    maxCostPerRequest: 0.01
  }
});

// Latency optimization
const router = new AgentRouter({
  routing: {
    strategy: 'latency-optimized',
    maxLatency: 2000, // ms
    enableParallelRequests: true
  }
});

// Capability matching
const router = new AgentRouter({
  routing: {
    strategy: 'capability-matched',
    requiredCapabilities: ['function-calling', 'vision']
  }
});

Configuration

Environment Variables

# AI Providers
ANTHROPIC_API_KEY=~/.tokens/anthropic
OPENAI_API_KEY=~/.tokens/openai
HUGGINGFACE_API_KEY=~/.tokens/huggingface

# Databases
DATABASE_URL=postgresql://user:pass@localhost:5432/llm_platform
REDIS_URL=redis://localhost:6379
MONGODB_URL=mongodb://localhost:27017/audit_logs

# Authentication
JWT_SECRET=your-jwt-secret
JWT_EXPIRATION=24h
OAUTH2_CLIENT_ID=drupal-client-id

# Observability
PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006
JAEGER_ENDPOINT=http://localhost:14268/api/traces
PROMETHEUS_PORT=9090

# Features
ENABLE_CACHING=true
ENABLE_RATE_LIMITING=true
ENABLE_GRAPHQL=true
ENABLE_MCP=true

Examples

Caching Layer

// Enable request caching
const router = new AgentRouter({
  caching: {
    enabled: true,
    ttl: 3600, // 1 hour
    storage: 'redis',
    keyPrefix: 'llm:'
  }
});

// Cached request
const response = await router.complete({
  prompt: 'What is the capital of France?',
  cache: true
});

Rate Limiting

// Configure rate limiting
const router = new AgentRouter({
  rateLimiting: {
    enabled: true,
    windowMs: 60000, // 1 minute
    maxRequests: 100,
    perUser: true
  }
});

MCP Server

import { MCPServer } from '@bluefly/agent-router/mcp';

const mcp = new MCPServer({
  tools: [
    {
      name: 'complete_llm',
      description: 'Complete text using LLM',
      schema: { /* JSON Schema */ },
      handler: async (params) => {
        return await router.complete(params);
      },
    },
  ],
});

await mcp.start();

Observability

Phoenix Arize

# View AI-specific traces
open http://localhost:6006

# Monitor LLM performance
curl http://localhost:3000/api/v1/observability/llm

Prometheus Metrics

# Export metrics
curl http://localhost:9090/metrics

# Key metrics:
# - llm_requests_total
# - llm_request_duration_seconds
# - llm_tokens_used_total
# - provider_availability
# - cache_hit_rate

Testing

npm test
npm run test:integration
npm run test:e2e
npm run test:performance

Deployment

Kubernetes

kubectl apply -f infrastructure/kubernetes/production/
kubectl get pods -n agent-router

Docker

npm run docker:build
npm run docker:compose

Performance

Throughput: 10,000+ requests/second with caching
Latency: <50ms p99 (excluding LLM provider time)
Availability: 99.99% uptime with multi-provider failover
Cache Hit Rate: 40-60% for typical workloads
Token Cost Savings: 30-50% with optimal routing

@bluefly/agent-protocol - MCP protocol
@bluefly/agent-mesh - Agent coordination
@bluefly/agent-brain - Knowledge graph

Documentation

GitLab: https://gitlab.bluefly.io/llm/npm/agent-router
OpenAPI Spec: openapi.yaml