Vector Database Architecture
Qdrant integration for embeddings and similarity search
Overview
The LLM Platform uses Qdrant as its vector database for storing embeddings, performing similarity search, and enabling semantic retrieval across documents, code, and knowledge graphs.
Technology: Qdrant
Ports: 6333 (HTTP), 6334 (gRPC)
URL: http://localhost:6333
Architecture
graph TB
subgraph "Application Layer"
Drupal[Drupal Modules]
Agents[AI Agents]
API[LLM Gateway]
end
subgraph "Embedding Layer"
OpenAI[OpenAI<br/>text-embedding-3]
Local[Local Models<br/>sentence-transformers]
end
subgraph "Qdrant Core"
HTTP[HTTP API<br/>Port 6333]
gRPC[gRPC API<br/>Port 6334]
Search[Vector Search]
Index[HNSW Index]
end
subgraph "Collections"
Docs[Documents<br/>1536 dims]
Code[Code<br/>1536 dims]
Knowledge[Knowledge Graph<br/>384 dims]
Agent[Agent Memory<br/>1536 dims]
end
subgraph "Storage"
Disk[(Persistent Storage)]
Memory[(In-Memory Cache)]
end
Drupal --> OpenAI
Agents --> OpenAI
API --> Local
OpenAI --> HTTP
Local --> gRPC
HTTP --> Search
gRPC --> Search
Search --> Index
Index --> Docs
Index --> Code
Index --> Knowledge
Index --> Agent
Docs --> Disk
Code --> Disk
Knowledge --> Memory
Agent --> Memory
Collections
Document Collection
Purpose: Store document embeddings for semantic search
import { QdrantClient } from '@qdrant/qdrant-js'
const client = new QdrantClient({
url: 'http://localhost:6333'
})
// Create collection
await client.createCollection('documents', {
vectors: {
size: 1536, // OpenAI text-embedding-3-small
distance: 'Cosine'
},
optimizers_config: {
indexing_threshold: 20000
}
})
// Insert document
await client.upsert('documents', {
points: [
{
id: 'doc-123',
vector: embeddingVector,
payload: {
title: 'LLM Platform Architecture',
content: 'The LLM Platform is...',
type: 'documentation',
created_at: new Date().toISOString()
}
}
]
})
// Search similar documents
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 10,
with_payload: true,
filter: {
must: [
{
key: 'type',
match: { value: 'documentation' }
}
]
}
})
Code Collection
Purpose: Store code embeddings for semantic code search
// Create collection
await client.createCollection('code', {
vectors: {
size: 1536,
distance: 'Cosine'
}
})
// Insert code
await client.upsert('code', {
points: [
{
id: 'code-456',
vector: codeEmbedding,
payload: {
file_path: 'src/api/users.ts',
function_name: 'getUserById',
language: 'typescript',
description: 'Fetches user by ID from database',
lines: [10, 25]
}
}
]
})
// Semantic code search
const codeResults = await client.search('code', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{
key: 'language',
match: { value: 'typescript' }
}
]
}
})
Knowledge Graph Collection
Purpose: Store entity and relationship embeddings
// Create collection
await client.createCollection('knowledge', {
vectors: {
size: 384, // all-MiniLM-L6-v2
distance: 'Cosine'
}
})
// Insert entity
await client.upsert('knowledge', {
points: [
{
id: 'entity-789',
vector: entityEmbedding,
payload: {
entity_name: 'Agent BuildKit',
entity_type: 'software',
observations: ['Enterprise autonomous agent platform'],
relations: ['uses:Phoenix Arize', 'integrates:GitLab CE']
}
}
]
})
Agent Memory Collection
Purpose: Store agent conversation history and context
// Create collection
await client.createCollection('agent_memory', {
vectors: {
size: 1536,
distance: 'Cosine'
}
})
// Insert conversation
await client.upsert('agent_memory', {
points: [
{
id: 'conv-123',
vector: conversationEmbedding,
payload: {
agent_id: 'tdd-enforcer-001',
session_id: 'session-456',
timestamp: new Date().toISOString(),
messages: [
{ role: 'user', content: 'Run tests' },
{ role: 'assistant', content: 'Running tests...' }
],
context: {
files_modified: ['src/api/users.ts'],
coverage: 85
}
}
}
]
})
Vector Search Strategies
Semantic Search
// Generate embedding for query
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'How do I deploy to Kubernetes?'
})
// Search documents
const results = await client.search('documents', {
vector: queryEmbedding.data[0].embedding,
limit: 10,
score_threshold: 0.7 // Minimum similarity score
})
Hybrid Search (Vector + Keyword)
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 10,
filter: {
must: [
{
key: 'content',
match: { text: 'kubernetes deployment' } // Keyword filter
}
]
}
})
Multi-Vector Search
// Search with multiple query vectors
const results = await client.search('documents', {
vector: [
queryEmbedding1, // Primary query
queryEmbedding2 // Secondary query
],
limit: 10,
with_payload: true
})
Recommendation Search
// Recommend similar items
const results = await client.recommend('documents', {
positive: ['doc-123', 'doc-456'], // Like these
negative: ['doc-789'], // Not like this
limit: 10
})
Advanced Features
Payload Indexing
// Create indexed payload fields for faster filtering
await client.createPayloadIndex('documents', {
field_name: 'type',
field_schema: 'keyword'
})
await client.createPayloadIndex('documents', {
field_name: 'created_at',
field_schema: 'datetime'
})
Quantization (Performance Optimization)
// Enable scalar quantization to reduce memory usage
await client.updateCollection('documents', {
quantization_config: {
scalar: {
type: 'int8',
quantile: 0.99
}
}
})
Batch Operations
// Batch insert for performance
const points = Array.from({ length: 10000 }, (_, i) => ({
id: `doc-${i}`,
vector: generateEmbedding(),
payload: { index: i }
}))
await client.upsert('documents', {
points,
wait: true // Wait for operation to complete
})
Client Libraries
TypeScript
import { QdrantClient } from '@qdrant/qdrant-js'
const client = new QdrantClient({
url: 'http://localhost:6333',
apiKey: process.env.QDRANT_API_KEY
})
// Using gRPC for better performance
const grpcClient = new QdrantClient({
host: 'localhost',
port: 6334
})
Python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
client = QdrantClient(url="http://localhost:6333")
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Search
results = client.search(
collection_name="documents",
query_vector=embedding,
limit=10
)
PHP (Drupal)
<?php
use GuzzleHttp\Client;
class QdrantClient {
private $client;
private $baseUrl = 'http://localhost:6333';
public function search(string $collection, array $vector, int $limit = 10): array {
$response = $this->client->post("{$this->baseUrl}/collections/{$collection}/points/search", [
'json' => [
'vector' => $vector,
'limit' => $limit,
'with_payload' => TRUE
]
]);
return json_decode($response->getBody(), TRUE);
}
}
Configuration
# config/qdrant.yaml
qdrant:
url: http://localhost:6333
grpc_url: localhost:6334
api_key: ${QDRANT_API_KEY}
collections:
documents:
vector_size: 1536
distance: cosine
on_disk: true
code:
vector_size: 1536
distance: cosine
on_disk: true
knowledge:
vector_size: 384
distance: cosine
on_disk: false # Keep in memory
agent_memory:
vector_size: 1536
distance: cosine
on_disk: false
ttl: 86400 # 24 hours
Monitoring
Metrics
GET /metrics
# Collection metrics
qdrant_collection_points_total{collection="documents"} 125000
qdrant_collection_segments{collection="documents"} 5
# Performance metrics
qdrant_search_duration_seconds{collection="documents",quantile="0.99"} 0.045
qdrant_search_requests_total{collection="documents",status="success"} 10000
Health Check
GET /health
Response:
{
"status": "ok",
"version": "1.7.0"
}
Best Practices
1. Choose Right Vector Size
- 1536: OpenAI text-embedding-3-small (best quality/performance)
- 768: sentence-transformers (good quality, faster)
- 384: all-MiniLM-L6-v2 (fastest, lower quality)
2. Optimize for Scale
// Use on_disk storage for large collections
await client.createCollection('large_collection', {
vectors: { size: 1536, distance: 'Cosine' },
on_disk_payload: true
})
// Enable quantization
await client.updateCollection('large_collection', {
quantization_config: { scalar: { type: 'int8' } }
})
3. Batch Operations
// Insert in batches of 100-1000
for (let i = 0; i < points.length; i += 100) {
await client.upsert('collection', {
points: points.slice(i, i + 100)
})
}
4. Use Payload Indexes
// Index frequently filtered fields
await client.createPayloadIndex('documents', {
field_name: 'type',
field_schema: 'keyword'
})
Related Documentation
- System Overview
- Agent Tracer - Embedding analytics
- LLM Gateway - Embedding generation