// capabilities

Reference implementations.
Runnable, measured, replicable.

Architecture patterns we deploy in production. Each one is instrumented with real evals and metrics, runnable inside your stack, and replicable across model and infra changes — with stack details and real numbers. Not invented case studies.

CAP_01

Customer-support agent

Tool-calling agent over docs + ticket history.

Grounded retrieval-augmented agent that resolves tier-1 and tier-2 support tickets, hands off cleanly when uncertain, and writes back to your CRM. Built around a strict refusal policy and a continuously-updated eval set.

Stack
Claude 3.5 SonnetGPT-4o (fallback)pgvectorLangGraphLangfuseTemporal
Architecture sketch
UserInbound message via Slack / web / email
RouterClassifier picks specialist + retrieval mode
RetrieverHybrid BM25 + dense, with metadata filters
PlannerDecomposes into tool calls, max depth 5
ToolsCRM lookup · order status · refund · escalate
VerifierChecks grounding, then commits action
Production metrics
Eval accuracy
94.6%
on golden set, 240 fixtures
p95 latency
812ms
end-to-end
Cost / req
$0.014
incl. retrieval
Auto-resolve
67%
no human required
Escalation
96%
correctly routed
Refusal rate
2.1%
when confidence low
Numbers reflect production runs from founding-team prior work, replicated in our reference stack. Replicable in your account under the audit.
// stack

Technology we ship on.

Models

Claude (Anthropic)
GPT-4o / GPT-5
Llama 3
Fine-tunes

Retrieval

pgvector
Qdrant
Postgres FTS
BM25

Orchestration

LangGraph
Temporal
Inngest
Trigger.dev

Observability

Langfuse
OpenTelemetry
Sentry
Grafana

Frontend

Next.js 14+
React Native
Tailwind
shadcn/ui

Infra

AWS
Cloudflare
Vercel
Modal
Fly.io

Want one of these in your stack?

Every reference implementation can be adapted, extended, or replaced with something better — with your data and your constraints.