Building a 24/7 AI Agent Platform from Scratch: Lessons from a 300K LOC System
I'm Denis Shokhirev, Enterprise AI architect based in Erlangen, Germany. At DennisCraft AI Studio, I ship AI systems to DACH B2B clients in logistics, fintech, and industrial automation, using a stack of Claude, Supabase, n8n, Doppler, and self-hosted Postgres. Shipping 14 production AI agents in six months exposed pain points that don't show up in demos: concurrency bugs, token exhaustion, and LLM code exposing real-world risk. This post breaks down the architecture of my 300K LOC platform—the
I'm Denis Shokhirev, Enterprise AI architect based in Erlangen, Germany. At DennisCraft AI Studio, I ship AI systems to DACH B2B clients in logistics, fintech, and industrial automation, using a stack of Claude, Supabase, n8n, Doppler, and self-hosted Postgres. Shipping 14 production AI agents in six months exposed pain points that don't show up in demos: concurrency bugs, token exhaustion, and LLM code exposing real-world risk. This post breaks down the architecture of my 300K LOC platform—the real patterns that survive regulated European production, not slides.
Core Architecture: Stable, Isolated Agents First
Design Pattern
The goal is to isolate each AI agent, ensuring a stable pipeline: job queues, fail tracking, tight control of external calls (Claude Code, OpenAI API), and granular monitoring. Each agent runs as a separate process, orchestrated via async queues (Supabase Realtime, Redis pub/sub). Why not microservices? For AI agents, process pools are simpler—otherwise, shared state and API quota management become a nightmare.
Production-Grade Stack (with Real Drawbacks)
| Component | Why I Chose It | Pain Points |
|---|---|---|
| Claude Code / Anthropic SDK | Best price/quality for reasoning-heavy agents | Strict rate limits, occasional latency spikes |
| Supabase | Fast pub/sub and metadata storage | Realtime sometimes drops events, fallback needed |
| n8n | Pipeline orchestration, visual editing | Debugging deep chains is hard, retry bugs pop up |
| Doppler | Secret management, simple CI | Lacks granular audit trails for large teams |
| Self-hosted Postgres | GDPR compliance, data control | Bottlenecks under load; query tuning required |
Data Flow: From Request to Audit Log
Request Handling Pattern
Every incoming request (API or UI) is validated via pydantic schemas, then dropped into a Supabase queue. An agent process pulls pending jobs asynchronously and runs all steps: preprocessing, LLM call (Claude/Anthropic), postprocessing, and result storage in Postgres.
from supabase import create_client
import asyncio
async def process_task(supabase_url, supabase_key):
supabase = create_client(supabase_url, supabase_key)
while True:
task = supabase.table('tasks').select('*').eq('status', 'pending').limit(1).execute()
if task.data:
result = run_agent_logic(task.data[0])
supabase.table('tasks').update({'status': 'done', 'result': result}).eq('id', task.data[0]['id']).execute()
await asyncio.sleep(1)
Audit Logging
Every LLM call is logged in a dedicated Postgres table: prompt, output, latency, user ID. For GDPR, I maintain a separate audit trail: who, when, what prompt, what output. After a fintech client incident where an LLM returned a risky output, I introduced manual review for 2% of random jobs using n8n + Notion as a review queue.
Security: Never Trust LLM-Generated Code
LLM Output as a Production Risk
Most production vulnerabilities in my stack come not from inbound requests, but from LLM-generated code. On three recent agent deployments, I caught SQL-injection and unsafe shell execution patterns in Python snippets generated by Claude. For static analysis, I run semgrep, bandit, and occasionally gitleaks for secret scanning. This matches findings from the 2023 Anthropic LLM Security paper (source), which highlights prompt-injection and code-gen risks.
semgrep --config=python security/ --error
bandit -r ./agents/
gitleaks detect --source=./
Sandboxing: Containing LLM Output
All LLM-generated code is executed in a sandboxed container (firejail + custom Docker) with strict limits on CPU, memory, and network calls. After a 2024 prompt-injection incident (a malicious SQL DELETE in a RAG agent), I added regex-based prompt filtering and enforced runtime sandboxes. No LLM code runs with production credentials or direct DB access.
Monitoring and Alerting: What Actually Works
Metrics and Alert Patterns
Metrics ship to a self-hosted Prometheus + Grafana setup: per-agent latency, error rates, queue health. For critical alerts, I push to a Telegram bot. Example: if latency > 5s or error rate > 2% over 10 minutes, immediate notification triggers.
groups:
- name: ai-agent-alerts
rules:
- alert: HighLatency
expr: avg_over_time(agent_latency[5m]) > 5
for: 2m
annotations:
summary: "High latency detected in AI agent"
FAQ
Why not use an off-the-shelf no-code AI platform?
Everything on the market either fails GDPR (third-party data processing) or can't support complex agent pipelines. Self-hosted and full control are mandatory for European production.
How do you test agent pipeline reliability?
Every pipeline step has unit tests. Once a week, I run end-to-end tests via n8n. For LLM outputs, I snapshot results and diff against golden datasets.
How do you handle API rate limits?
Job queues and retry logic. Supabase queues plus one process pool per LLM endpoint prevent 429 errors in production.
How do you manage secrets and tokens?
Doppler for central secret storage, with role-based access. Critical keys are never logged, never leave the production server.
What about scaling up?
So far, horizontal scaling—new agent processes, separate queues, Postgres replicas—is enough. For 100+ agents, I'll move to Kubernetes or similar orchestration.
Which part of your agent stack causes the most production incidents: job queueing, LLM logic, or external service integrations? I'd genuinely like to know. I run a free 30-min stack audit for DACH founders building AI in regulated markets. DM me on LinkedIn or write to @ger_dennis_ai.
Turn your process into an AI system
Fixed price. Production quality. DACH B2B focus.