Common Pitfalls
Learn from common mistakes and how to avoid them when working with the LLM Platform, BuildKit, and OSSA agents.
Table of Contents
- Installation & Setup
- OSSA Agent Development
- BuildKit CLI Usage
- Drupal Development
- Workflow Orchestration
- Production Deployment
- Performance & Optimization
- Security
Installation & Setup
❌ Pitfall: Using Docker Desktop Instead of OrbStack (macOS)
Problem:
# Slow file sync, high CPU usage, poor performance
docker ps # Takes 5+ seconds
Solution:
# Switch to OrbStack for 10x better performance
brew install orbstack
# Uninstall Docker Desktop
# macOS → Applications → Docker → Uninstall
# Verify OrbStack
docker ps # Should be instant
Why it matters: Docker Desktop on macOS uses inefficient file mounting. OrbStack uses VirtIO-FS for native-speed file access.
❌ Pitfall: Not Installing DDEV Addons
Problem:
ddev drush status
# Command not found: drush
Solution:
# Install platform-specific DDEV addons (one-time)
cd ~/Sites/LLM/llm-platform
./infrastructure/ddev-addons/install-addons.sh
# Now available:
ddev drush status
ddev tddai check
ddev git-safe commit
❌ Pitfall: Wrong PHP Version
Problem:
composer install
# Your requirements could not be resolved to an installable set of packages
# drupal/core requires php >=8.3
Solution:
# Check PHP version
php --version
# macOS: Install PHP 8.3
brew install php@8.3
brew link php@8.3 --force --overwrite
# Verify
php --version # Should show 8.3.x
❌ Pitfall: Missing Environment Variables
Problem:
buildkit agents deploy
# Error: GITLAB_TOKEN not set
Solution:
# Store tokens in ~/.tokens/
mkdir -p ~/.tokens
echo "your-gitlab-token" > ~/.tokens/gitlab
chmod 600 ~/.tokens/gitlab
# Set environment variables
export GITLAB_URL="https://gitlab.bluefly.io"
export GITLAB_TOKEN=$(cat ~/.tokens/gitlab)
# Persist in shell profile
echo 'export GITLAB_URL="https://gitlab.bluefly.io"' >> ~/.zshrc
echo 'export GITLAB_TOKEN=$(cat ~/.tokens/gitlab)' >> ~/.zshrc
OSSA Agent Development
❌ Pitfall: Invalid OSSA Manifest
Problem:
# agent.ossa.yaml
ossaVersion: "0.2.4"
agent:
id: my-agent
name: My Agent
# Missing required fields!
Solution:
ossaVersion: "0.2.4"
agent:
id: my-agent # Required: DNS-1123 format
name: My Agent # Required: Human-readable name
version: "1.0.0" # Required: Semantic version
role: worker # Required: worker, governor, critic, observer
runtime: # Required
type: local # Required
node:
version: "20.x"
entrypoint: "dist/index.js"
capabilities: # Required: At least one
- name: process_data
description: Process data
input_schema: { type: object }
output_schema: { type: object }
Validate:
ossa validate agent.ossa.yaml
❌ Pitfall: Missing Capability Input/Output Schemas
Problem:
capabilities:
- name: validate_code
description: Validate code
# Missing input_schema and output_schema!
Impact: Agents can't communicate, workflow orchestration breaks, no type safety.
Solution:
capabilities:
- name: validate_code
description: Validate code quality
input_schema:
type: object
properties:
files:
type: array
items:
type: string
description: List of file paths
standards:
type: string
enum: [drupal, javascript, python]
required: [files]
output_schema:
type: object
properties:
valid:
type: boolean
violations:
type: array
items:
type: object
summary:
type: string
required: [valid]
❌ Pitfall: Not Handling Agent Errors
Problem:
// Agent crashes on error
app.post('/capabilities/validate', async (req, res) => {
const result = await validateCode(req.body.files); // Might throw!
res.json(result);
});
Solution:
app.post('/capabilities/validate', async (req, res) => {
try {
const { files, standards = 'javascript' } = req.body;
if (!files || !Array.isArray(files)) {
return res.status(400).json({
error: 'Invalid input: files array required',
code: 'INVALID_INPUT'
});
}
const result = await validateCode(files, standards);
res.json(result);
} catch (error) {
logger.error('Validation failed', { error: error.message, stack: error.stack });
res.status(500).json({
error: 'Validation failed',
code: 'VALIDATION_ERROR',
message: error.message
});
}
});
❌ Pitfall: Ignoring Agent Health Checks
Problem:
# Kubernetes kills agent repeatedly
kubectl get pods -n agents
# agent-pod 0/1 CrashLoopBackOff
Solution:
// Add health check endpoint
app.get('/health', (req, res) => {
const health = {
status: 'ok',
agent: 'my-agent',
version: '1.0.0',
uptime: process.uptime(),
memory: process.memoryUsage(),
};
res.json(health);
});
// Add readiness check
app.get('/ready', async (req, res) => {
try {
// Check dependencies
await checkDatabaseConnection();
await checkExternalServices();
res.json({ ready: true });
} catch (error) {
res.status(503).json({ ready: false, error: error.message });
}
});
Kubernetes config:
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 15
periodSeconds: 5
BuildKit CLI Usage
❌ Pitfall: Not Checking Agent Status Before Deployment
Problem:
buildkit agents deploy my-agent
# Deploys broken agent to production!
Solution:
# Always validate first
buildkit ossa validate agent.ossa.yaml
# Test locally
buildkit agents start my-agent --local
# Run health check
curl http://localhost:3000/health
# Then deploy
buildkit agents deploy my-agent --namespace agents
❌ Pitfall: Hardcoding Configuration
Problem:
// Hardcoded values
const DATABASE_URL = 'postgresql://user:password@localhost:5432/db';
const API_KEY = 'sk-1234567890';
Solution:
// Use environment variables
const DATABASE_URL = process.env.DATABASE_URL;
const API_KEY = process.env.API_KEY;
// Validate at startup
if (!DATABASE_URL || !API_KEY) {
console.error('Missing required environment variables');
process.exit(1);
}
Set in Kubernetes:
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: API_KEY
valueFrom:
secretKeyRef:
name: api-credentials
key: key
❌ Pitfall: Not Using BuildKit Golden Commands
Problem:
# Doing manual work that BuildKit automates
grep -r "TODO" src/
find . -name "*.ts" -exec eslint {} \;
git add . && git commit -m "Update"
Solution:
# Use BuildKit golden commands instead
buildkit golden audit # Comprehensive security + quality audit
buildkit golden fix # Auto-fix issues
buildkit golden test # Run all tests
buildkit golden sync # Sync GitLab (issues + wiki)
buildkit golden deploy --env dev # Deploy with checks
Drupal Development
❌ Pitfall: Editing Composer-Managed Modules
Problem:
# Editing files in web/modules/custom/
cd /Users/flux423/Sites/LLM/llm-platform/web/modules/custom/llm
nano llm.module # Changes will be LOST on composer install!
Solution:
# Edit source files instead
cd /Users/flux423/Sites/LLM/all_drupal_custom/modules/llm
nano llm.module
# Sync to llm-platform
buildkit drupal sync --modules
# Or manually
cd /Users/flux423/Sites/LLM/llm-platform
composer update drupal/llm
Why: web/modules/custom/* is managed by Composer and will be overwritten.
❌ Pitfall: Not Clearing Drupal Cache
Problem:
# Made changes but don't see them
# Updated routing, added service, changed config
Solution:
# Always clear cache after changes
ddev drush cr
# Or use DDEV shortcut
ddev restart
When to clear cache:
- After configuration import (drush cim)
- After module enable/disable
- After routing changes
- After service definition changes
- After pretty much anything!
❌ Pitfall: Skipping Configuration Export
Problem:
# Made configuration changes in UI
# Didn't export to code
# Lost on next deployment!
Solution:
# After any UI configuration changes
ddev drush cex -y
# Commit changes
git add config/
git commit -m "feat: update content type configuration"
Automate with Git hook:
# .git/hooks/pre-commit
#!/bin/bash
ddev drush cex -y
git add config/
Workflow Orchestration
❌ Pitfall: Missing Workflow Dependencies
Problem:
stages:
- name: deploy
# Forgot depends_on!
steps:
- name: deploy_to_prod
agent: deployment-orchestrator
Impact: Deploy runs before tests complete, deploys broken code.
Solution:
stages:
- name: validate
steps: [...]
- name: test
depends_on: [validate]
steps: [...]
- name: deploy
depends_on: [test]
condition: "{{ stages.test.status == 'passed' }}"
steps: [...]
❌ Pitfall: No Timeout Configuration
Problem:
steps:
- name: run_tests
agent: test-runner
# No timeout! Hangs forever if tests freeze.
Solution:
steps:
- name: run_tests
agent: test-runner
capability: run_tests
timeout: 10m # Fail after 10 minutes
retry:
max_attempts: 2
backoff: exponential
❌ Pitfall: Not Handling Workflow Failures
Problem:
# No failure handling
# Leaves deployments in inconsistent state
Solution:
on_workflow_failure:
- name: rollback
agent: deployment-orchestrator
capability: rollback
input:
environment: "{{ env.ENVIRONMENT }}"
- name: notify_team
agent: slack-notifier
capability: send_message
input:
channel: "#incidents"
message: "Deployment failed: {{ workflow.error }}"
Production Deployment
❌ Pitfall: No Resource Limits
Problem:
# Kubernetes deployment without limits
spec:
containers:
- name: agent
image: my-agent:latest
# No resources! Agent can consume all cluster resources!
Solution:
spec:
containers:
- name: agent
image: my-agent:latest
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
❌ Pitfall: Missing Health Checks in Kubernetes
Problem:
# Kubernetes doesn't know if pod is healthy
kubectl get pods
# Pod shows Running but agent is crashed inside
Solution:
spec:
containers:
- name: agent
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 3
❌ Pitfall: Not Using Persistent Volumes
Problem:
# Pod restarts, loses all data!
kubectl delete pod my-agent-xyz
# Files, databases, logs - all gone!
Solution:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: agent-storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: fast-ssd
---
spec:
containers:
- name: agent
volumeMounts:
- name: storage
mountPath: /data
volumes:
- name: storage
persistentVolumeClaim:
claimName: agent-storage
❌ Pitfall: No SSL/TLS Configuration
Problem:
# Accessing service via HTTP
curl http://agents.yourcompany.com
# Insecure! Credentials sent in plaintext!
Solution:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: agents-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- agents.yourcompany.com
secretName: agents-tls
rules:
- host: agents.yourcompany.com
http:
paths:
- path: /
backend:
service:
name: agents
port:
number: 80
Performance & Optimization
❌ Pitfall: Not Enabling Caching
Problem:
// Recalculates expensive operation on every request
app.get('/data', async (req, res) => {
const data = await expensiveCalculation(); // Takes 5 seconds!
res.json(data);
});
Solution:
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
app.get('/data', async (req, res) => {
// Check cache first
const cached = await redis.get('expensive-data');
if (cached) {
return res.json(JSON.parse(cached));
}
// Calculate and cache
const data = await expensiveCalculation();
await redis.setex('expensive-data', 3600, JSON.stringify(data));
res.json(data);
});
❌ Pitfall: Blocking Event Loop
Problem:
// Synchronous file operations block event loop
app.post('/process', (req, res) => {
const files = fs.readdirSync('./large-directory'); // Blocks!
files.forEach(file => {
const content = fs.readFileSync(file); // Blocks!
processFile(content);
});
res.json({ done: true });
});
Solution:
app.post('/process', async (req, res) => {
// Use async operations
const files = await fs.promises.readdir('./large-directory');
await Promise.all(
files.map(async file => {
const content = await fs.promises.readFile(file);
await processFile(content);
})
);
res.json({ done: true });
});
❌ Pitfall: Not Monitoring Memory Usage
Problem:
# Agent crashes with OOM (Out of Memory)
kubectl get pods
# agent-xyz 0/1 OOMKilled
Solution:
// Monitor memory usage
setInterval(() => {
const usage = process.memoryUsage();
const mbUsed = Math.round(usage.heapUsed / 1024 / 1024);
logger.info('Memory usage', { heapUsed: mbUsed });
if (mbUsed > 1500) { // 1.5 GB threshold
logger.warn('High memory usage', { heapUsed: mbUsed });
// Trigger cleanup, restart, or scale
}
}, 60000); // Check every minute
Security
❌ Pitfall: Committing Secrets to Git
Problem:
# Committed .env file with secrets!
git add .env
git commit -m "Add config"
git push
# Secrets now in Git history forever!
Solution:
# Add to .gitignore
echo ".env" >> .gitignore
echo ".env.*" >> .gitignore
echo "*.pem" >> .gitignore
echo "*.key" >> .gitignore
# Remove from Git history if already committed
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch .env" \
--prune-empty --tag-name-filter cat -- --all
Store secrets securely:
# Use ~/.tokens/ directory
mkdir -p ~/.tokens
chmod 700 ~/.tokens
echo "secret-value" > ~/.tokens/service-name
chmod 600 ~/.tokens/service-name
# Reference in code
const token = fs.readFileSync(path.join(os.homedir(), '.tokens', 'gitlab'), 'utf8').trim();
❌ Pitfall: Not Validating Input
Problem:
// Vulnerable to injection attacks
app.post('/execute', (req, res) => {
const command = req.body.command;
exec(command); // Command injection!
});
Solution:
import validator from 'validator';
app.post('/execute', (req, res) => {
const { command } = req.body;
// Validate input
if (!command || typeof command !== 'string') {
return res.status(400).json({ error: 'Invalid command' });
}
// Whitelist allowed commands
const allowedCommands = ['test', 'build', 'deploy'];
if (!allowedCommands.includes(command)) {
return res.status(403).json({ error: 'Command not allowed' });
}
// Sanitize and execute safely
exec(validator.escape(command), { timeout: 30000 }, (error, stdout) => {
if (error) {
return res.status(500).json({ error: error.message });
}
res.json({ output: stdout });
});
});
❌ Pitfall: Missing Rate Limiting
Problem:
// No rate limiting - vulnerable to DDoS
app.post('/webhook', async (req, res) => {
await processWebhook(req.body);
res.json({ received: true });
});
Solution:
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Limit each IP to 100 requests per windowMs
message: 'Too many requests, please try again later',
});
app.post('/webhook', limiter, async (req, res) => {
await processWebhook(req.body);
res.json({ received: true });
});
Quick Reference: Troubleshooting Commands
# DDEV
ddev describe # Show DDEV project info
ddev logs # View container logs
ddev restart # Restart containers
ddev delete -O && ddev start # Nuclear option: rebuild everything
# BuildKit
buildkit agents status <name> # Check agent health
buildkit agents logs <name> # View agent logs
buildkit agents restart <name> # Restart agent
buildkit ossa validate <file> # Validate OSSA manifest
# Kubernetes
kubectl get pods -n agents # List agent pods
kubectl describe pod <pod> # Pod details
kubectl logs <pod> --follow # Stream logs
kubectl exec -it <pod> -- /bin/sh # Shell into pod
# Drupal
ddev drush status # Drupal status
ddev drush cr # Clear cache
ddev drush cex -y # Export config
ddev drush cim -y # Import config
ddev drush updb -y # Run database updates
Next Steps
- Review System Requirements for optimal setup
- Follow Development Setup for best practices
- Learn Production Deployment patterns