Vector Database Architecture

Qdrant integration for embeddings and similarity search

Overview

The LLM Platform uses Qdrant as its vector database for storing embeddings, performing similarity search, and enabling semantic retrieval across documents, code, and knowledge graphs.

Technology: Qdrant Ports: 6333 (HTTP), 6334 (gRPC) URL: http://localhost:6333

Architecture

graph TB
    subgraph "Application Layer"
        Drupal[Drupal Modules]
        Agents[AI Agents]
        API[LLM Gateway]
    end

    subgraph "Embedding Layer"
        OpenAI[OpenAI<br/>text-embedding-3]
        Local[Local Models<br/>sentence-transformers]
    end

    subgraph "Qdrant Core"
        HTTP[HTTP API<br/>Port 6333]
        gRPC[gRPC API<br/>Port 6334]
        Search[Vector Search]
        Index[HNSW Index]
    end

    subgraph "Collections"
        Docs[Documents<br/>1536 dims]
        Code[Code<br/>1536 dims]
        Knowledge[Knowledge Graph<br/>384 dims]
        Agent[Agent Memory<br/>1536 dims]
    end

    subgraph "Storage"
        Disk[(Persistent Storage)]
        Memory[(In-Memory Cache)]
    end

    Drupal --> OpenAI
    Agents --> OpenAI
    API --> Local

    OpenAI --> HTTP
    Local --> gRPC

    HTTP --> Search
    gRPC --> Search
    Search --> Index

    Index --> Docs
    Index --> Code
    Index --> Knowledge
    Index --> Agent

    Docs --> Disk
    Code --> Disk
    Knowledge --> Memory
    Agent --> Memory

Collections

Document Collection

Purpose: Store document embeddings for semantic search

import { QdrantClient } from '@qdrant/qdrant-js'

const client = new QdrantClient({
  url: 'http://localhost:6333'
})

// Create collection
await client.createCollection('documents', {
  vectors: {
    size: 1536,  // OpenAI text-embedding-3-small
    distance: 'Cosine'
  },
  optimizers_config: {
    indexing_threshold: 20000
  }
})

// Insert document
await client.upsert('documents', {
  points: [
    {
      id: 'doc-123',
      vector: embeddingVector,
      payload: {
        title: 'LLM Platform Architecture',
        content: 'The LLM Platform is...',
        type: 'documentation',
        created_at: new Date().toISOString()
      }
    }
  ]
})

// Search similar documents
const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 10,
  with_payload: true,
  filter: {
    must: [
      {
        key: 'type',
        match: { value: 'documentation' }
      }
    ]
  }
})

Code Collection

Purpose: Store code embeddings for semantic code search

// Create collection
await client.createCollection('code', {
  vectors: {
    size: 1536,
    distance: 'Cosine'
  }
})

// Insert code
await client.upsert('code', {
  points: [
    {
      id: 'code-456',
      vector: codeEmbedding,
      payload: {
        file_path: 'src/api/users.ts',
        function_name: 'getUserById',
        language: 'typescript',
        description: 'Fetches user by ID from database',
        lines: [10, 25]
      }
    }
  ]
})

// Semantic code search
const codeResults = await client.search('code', {
  vector: queryEmbedding,
  limit: 5,
  filter: {
    must: [
      {
        key: 'language',
        match: { value: 'typescript' }
      }
    ]
  }
})

Knowledge Graph Collection

Purpose: Store entity and relationship embeddings

// Create collection
await client.createCollection('knowledge', {
  vectors: {
    size: 384,  // all-MiniLM-L6-v2
    distance: 'Cosine'
  }
})

// Insert entity
await client.upsert('knowledge', {
  points: [
    {
      id: 'entity-789',
      vector: entityEmbedding,
      payload: {
        entity_name: 'Agent BuildKit',
        entity_type: 'software',
        observations: ['Enterprise autonomous agent platform'],
        relations: ['uses:Phoenix Arize', 'integrates:GitLab CE']
      }
    }
  ]
})

Agent Memory Collection

Purpose: Store agent conversation history and context

// Create collection
await client.createCollection('agent_memory', {
  vectors: {
    size: 1536,
    distance: 'Cosine'
  }
})

// Insert conversation
await client.upsert('agent_memory', {
  points: [
    {
      id: 'conv-123',
      vector: conversationEmbedding,
      payload: {
        agent_id: 'tdd-enforcer-001',
        session_id: 'session-456',
        timestamp: new Date().toISOString(),
        messages: [
          { role: 'user', content: 'Run tests' },
          { role: 'assistant', content: 'Running tests...' }
        ],
        context: {
          files_modified: ['src/api/users.ts'],
          coverage: 85
        }
      }
    }
  ]
})

Vector Search Strategies

Semantic Search

// Generate embedding for query
const queryEmbedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'How do I deploy to Kubernetes?'
})

// Search documents
const results = await client.search('documents', {
  vector: queryEmbedding.data[0].embedding,
  limit: 10,
  score_threshold: 0.7  // Minimum similarity score
})

Hybrid Search (Vector + Keyword)

const results = await client.search('documents', {
  vector: queryEmbedding,
  limit: 10,
  filter: {
    must: [
      {
        key: 'content',
        match: { text: 'kubernetes deployment' }  // Keyword filter
      }
    ]
  }
})

Multi-Vector Search

// Search with multiple query vectors
const results = await client.search('documents', {
  vector: [
    queryEmbedding1,  // Primary query
    queryEmbedding2   // Secondary query
  ],
  limit: 10,
  with_payload: true
})

Recommendation Search

// Recommend similar items
const results = await client.recommend('documents', {
  positive: ['doc-123', 'doc-456'],  // Like these
  negative: ['doc-789'],              // Not like this
  limit: 10
})

Advanced Features

Payload Indexing

// Create indexed payload fields for faster filtering
await client.createPayloadIndex('documents', {
  field_name: 'type',
  field_schema: 'keyword'
})

await client.createPayloadIndex('documents', {
  field_name: 'created_at',
  field_schema: 'datetime'
})

Quantization (Performance Optimization)

// Enable scalar quantization to reduce memory usage
await client.updateCollection('documents', {
  quantization_config: {
    scalar: {
      type: 'int8',
      quantile: 0.99
    }
  }
})

Batch Operations

// Batch insert for performance
const points = Array.from({ length: 10000 }, (_, i) => ({
  id: `doc-${i}`,
  vector: generateEmbedding(),
  payload: { index: i }
}))

await client.upsert('documents', {
  points,
  wait: true  // Wait for operation to complete
})

Client Libraries

TypeScript

import { QdrantClient } from '@qdrant/qdrant-js'

const client = new QdrantClient({
  url: 'http://localhost:6333',
  apiKey: process.env.QDRANT_API_KEY
})

// Using gRPC for better performance
const grpcClient = new QdrantClient({
  host: 'localhost',
  port: 6334
})

Python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(url="http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Search
results = client.search(
    collection_name="documents",
    query_vector=embedding,
    limit=10
)

PHP (Drupal)

<?php

use GuzzleHttp\Client;

class QdrantClient {

  private $client;
  private $baseUrl = 'http://localhost:6333';

  public function search(string $collection, array $vector, int $limit = 10): array {
    $response = $this->client->post("{$this->baseUrl}/collections/{$collection}/points/search", [
      'json' => [
        'vector' => $vector,
        'limit' => $limit,
        'with_payload' => TRUE
      ]
    ]);

    return json_decode($response->getBody(), TRUE);
  }

}

Configuration

# config/qdrant.yaml
qdrant:
  url: http://localhost:6333
  grpc_url: localhost:6334
  api_key: ${QDRANT_API_KEY}

  collections:
    documents:
      vector_size: 1536
      distance: cosine
      on_disk: true

    code:
      vector_size: 1536
      distance: cosine
      on_disk: true

    knowledge:
      vector_size: 384
      distance: cosine
      on_disk: false  # Keep in memory

    agent_memory:
      vector_size: 1536
      distance: cosine
      on_disk: false
      ttl: 86400  # 24 hours

Monitoring

Metrics

GET /metrics

# Collection metrics
qdrant_collection_points_total{collection="documents"} 125000
qdrant_collection_segments{collection="documents"} 5

# Performance metrics
qdrant_search_duration_seconds{collection="documents",quantile="0.99"} 0.045
qdrant_search_requests_total{collection="documents",status="success"} 10000

Health Check

GET /health

Response:

{
  "status": "ok",
  "version": "1.7.0"
}

Best Practices

1. Choose Right Vector Size

1536: OpenAI text-embedding-3-small (best quality/performance)
768: sentence-transformers (good quality, faster)
384: all-MiniLM-L6-v2 (fastest, lower quality)

2. Optimize for Scale

// Use on_disk storage for large collections
await client.createCollection('large_collection', {
  vectors: { size: 1536, distance: 'Cosine' },
  on_disk_payload: true
})

// Enable quantization
await client.updateCollection('large_collection', {
  quantization_config: { scalar: { type: 'int8' } }
})

3. Batch Operations

// Insert in batches of 100-1000
for (let i = 0; i < points.length; i += 100) {
  await client.upsert('collection', {
    points: points.slice(i, i + 100)
  })
}

4. Use Payload Indexes

// Index frequently filtered fields
await client.createPayloadIndex('documents', {
  field_name: 'type',
  field_schema: 'keyword'
})

System Overview
Agent Tracer - Embedding analytics
LLM Gateway - Embedding generation