Agent Mesh - Home
Distributed Agent Coordination with Tailscale Integration and Zero-Trust Security
Overview
Agent Mesh is a production-grade distributed agent coordination layer that enables automatic agent discovery, intelligent task routing, secure communication, and load balancing across mesh networks. Built on Tailscale's MagicDNS for zero-configuration service discovery and leveraging zero-trust security principles.
Core Function: Coordinates autonomous agents across distributed networks, providing service discovery, task distribution, secure transport, and authentication without manual configuration.
Quick Start
# Install
npm install @bluefly/agent-buildkit
# Deploy agent to mesh
buildkit agent:mesh deploy \
--agent-id worker-1 \
--agent-name "Task Worker" \
--agent-type worker \
--namespace production \
--capabilities "task-execution,data-processing"
# Check mesh status
buildkit agent:mesh status
# Execute task on mesh
buildkit agent:mesh execute \
--task-id task-001 \
--task-type data-processing \
--payload '{"data": "example"}'
# Discover agents
buildkit agent:mesh discover --namespace production
Key Features
- Automatic Service Discovery: Tailscale MagicDNS integration for zero-configuration agent discovery
- Intelligent Task Routing: Capability-based agent matching with multiple load balancing strategies
- Secure Transport: Agent-to-agent communication over Tailscale encrypted network
- Zero-Trust Authentication: JWT-based authentication with ACL policies per agent type
- Load Balancing: Round-robin, least-loaded, capability-match, and random strategies
- Health Monitoring: Automatic heartbeat checking and health status tracking
- Fault Tolerance: Automatic retry logic, failover, and task reassignment
- Multi-Namespace Support: Isolated agent groups for different environments
Wiki Navigation
Core Documentation
- Architecture - Mesh networking and service discovery architecture
- Deployment Guide - Production deployment strategies
- Development Guide - Local development and contribution
Advanced Topics
- Tailscale Integration - MagicDNS and service discovery
- Security Model - Zero-trust authentication and ACL policies
- Load Balancing - Task distribution strategies
- Agent Types - Orchestrator, worker, monitor, integrator, governor, critic
CLI Reference
- Deploy Command - Register agents in the mesh
- Status Command - Monitor mesh health and agents
- Execute Command - Submit tasks to the mesh
- Discover Command - Find agents in the network
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Agent Mesh Network │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐│
│ │ Orchestrator │────▶│ Coordinator │────▶│ Workers ││
│ │ Agent │ │ Service │ │ (Pool) ││
│ └──────────────┘ └──────────────┘ └──────────────┘│
│ │ │ │ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌───────▼──────┐ ┌──────▼──────┐ │
│ │ Discovery │ │ Transport │ │ Auth │ │
│ │ Service │ │ Service │ │ Service │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ┌──────▼─────────────────────▼────────────────────▼──────┐│
│ │ Tailscale MagicDNS Network Layer ││
│ │ (Zero-Config Service Discovery + Encryption) ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
Agent Types
Orchestrator
- Role: Master coordinator for complex workflows
- Permissions: Full access to all resources and actions
- Use Cases: Multi-agent task orchestration, workflow management
Worker
- Role: Task execution and data processing
- Permissions: Read and execute on tasks/results
- Use Cases: CPU-intensive tasks, data transformations, batch processing
Monitor
- Role: Read-only observability and metrics collection
- Permissions: Read-only access to all resources
- Use Cases: Health monitoring, metrics aggregation, alerting
Integrator
- Role: External system integration
- Permissions: Read/write access to integrations and data
- Use Cases: API integrations, data pipelines, ETL processes
Governor
- Role: Policy and governance management
- Permissions: Admin access to policies, ACLs, audit logs
- Use Cases: Security policy management, compliance, auditing
Critic
- Role: Quality assurance and evaluation
- Permissions: Read/write access to reviews and metrics
- Use Cases: Output validation, quality scoring, feedback loops
Core Services
Discovery Service
Automatic agent registration and discovery using Tailscale MagicDNS: - Agent registration with identity and capabilities - Namespace-based isolation - Health checking and heartbeat monitoring - Capability-based filtering
Coordinator Service
Intelligent task distribution and management: - Task queue management - Agent capability matching - Load balancing strategies (round-robin, least-loaded, capability-match, random) - Fault tolerance and retry logic - Task status tracking and result retrieval
Transport Service
Secure agent-to-agent communication: - HTTP/HTTPS over Tailscale encrypted network - Request/response patterns - Streaming support for large payloads - Broadcast messaging - Connection pooling and retry logic
Auth Service
Zero-trust authentication and authorization: - JWT-based agent authentication - ACL policies per agent type - Permission-based access control - Security audit logging - Token revocation
Performance Metrics
- Service Discovery: <100ms agent lookup time
- Task Routing: <50ms agent selection and assignment
- Transport Latency: <10ms agent-to-agent (within Tailscale network)
- Failover Time: <5s automatic task reassignment
- Concurrent Tasks: 10,000+ tasks per coordinator instance
- Agent Scalability: 1,000+ agents per mesh network
CLI Commands
Deploy Agent
buildkit agent:mesh deploy \
--agent-id worker-1 \
--agent-name "Task Worker" \
--agent-type worker \
--namespace production \
--capabilities "task-execution,data-processing" \
--port 3000
Check Status
buildkit agent:mesh status \
--namespace production \
--health healthy
Execute Task
buildkit agent:mesh execute \
--task-id task-001 \
--task-type data-processing \
--payload '{"data": "example"}' \
--priority high \
--timeout 300000
Discover Agents
buildkit agent:mesh discover \
--namespace production \
--agent-type worker \
--capability task-execution
Generate Auth Token
buildkit agent:mesh auth \
--agent-id worker-1 \
--agent-type worker \
--namespace production \
--capabilities "task-execution"
View Workload
buildkit agent:mesh workload --agent-id worker-1
Load Balancing Strategies
Round-Robin
- Distributes tasks evenly across all agents
- Simple and predictable distribution
- Best for homogeneous agent pools
Least-Loaded
- Routes tasks to agents with fewest active tasks
- Optimizes for balanced workload distribution
- Best for heterogeneous environments
Capability-Match
- Prefers agents with exact capability matches
- Optimizes for specialized task execution
- Best for diverse task requirements
Random
- Randomly selects from available agents
- Simple and stateless
- Best for testing and development
Security Model
Network Layer
- Tailscale Encryption: WireGuard-based end-to-end encryption
- Zero-Configuration: MagicDNS automatic service discovery
- Network Isolation: Namespace-based logical separation
Application Layer
- JWT Authentication: Signed tokens with expiration
- ACL Policies: Role-based access control per agent type
- Audit Logging: Comprehensive security event tracking
- Token Revocation: Immediate access termination
Default ACL Policies
| Agent Type | Allowed Actions | Allowed Resources | Denied Actions |
|---|---|---|---|
| Orchestrator | * (all) | * (all) | None |
| Worker | read, execute | tasks, results | admin |
| Monitor | read | * (all) | write, execute, admin |
| Integrator | read, write | integrations, data | admin |
| Governor | read, write, admin | policies, acl, audit | None |
| Critic | read, write | reviews, feedback, metrics | admin |
Integration Examples
Python Agent
import requests
import json
# Agent registration
response = requests.post('http://agent-mesh:3000/mesh/register', json={
'agentId': 'python-worker-1',
'agentName': 'Python Data Processor',
'agentType': 'worker',
'namespace': 'production',
'capabilities': ['python', 'data-processing', 'ml'],
'port': 8080
})
# Task execution endpoint
@app.route('/mesh/rpc', methods=['POST'])
def execute_task():
task = request.json
result = process_task(task['payload'])
return jsonify({'success': True, 'result': result})
Node.js Agent
import axios from 'axios';
import express from 'express';
// Agent registration
await axios.post('http://agent-mesh:3000/mesh/register', {
agentId: 'node-worker-1',
agentName: 'Node.js API Processor',
agentType: 'worker',
namespace: 'production',
capabilities: ['nodejs', 'api-integration', 'json'],
port: 8080
});
// Task execution endpoint
app.post('/mesh/rpc', async (req, res) => {
const task = req.body;
const result = await processTask(task.payload);
res.json({ success: true, result });
});
Technology Stack
- Runtime: Node.js 20+, TypeScript 5.0+
- Service Discovery: Tailscale MagicDNS
- Authentication: JWT (jsonwebtoken)
- Validation: Zod schemas
- HTTP Client: Axios with connection pooling
- CLI: Commander.js with chalk styling
- Logging: Structured logging with log levels
Related Projects
- @bluefly/agent-router - LLM request routing
- @bluefly/agent-protocol - MCP protocol implementation
- @bluefly/agent-brain - Knowledge graph and vector search
- @bluefly/agent-buildkit - Enterprise agent framework
Quick Links
- Repository: https://gitlab.bluefly.io/llm/npm/agent-buildkit
- Issues: https://gitlab.bluefly.io/llm/npm/agent-buildkit/-/issues
- CI/CD: https://gitlab.bluefly.io/llm/npm/agent-buildkit/-/pipelines
- Package Registry: https://gitlab.bluefly.io/llm/npm/agent-buildkit/-/packages
Support
- Issues: https://gitlab.bluefly.io/llm/npm/agent-buildkit/-/issues
- Documentation: This wiki
- Team: LLM Platform Team llm-platform@bluefly.io
Last Updated: 2025-11-02 Maintainer: LLM Platform Team License: MIT