// notes

Notes from the workshop floor.

Practical write-ups on agents, evals, retrieval, and product engineering. No thought leadership, no 10x developer posts.

featuredevals

Write your evals before you write your RAG

Most teams build the retrieval system, then ask whether it works. The teams that ship reliably build the eval harness first.

28 Apr 2026 · 7 min read

evals-before-rag.yaml

# eval_config.yaml
metrics:
  - name: recall_at_5
    threshold: 0.88
  - name: faithfulness
    threshold: 0.92
  - name: answer_relevancy
    threshold: 0.85
fixtures: ./fixtures/rag_eval_set.jsonl

agents

21 Apr 2026

Tool-calling in production: what the benchmarks don't show

Benchmark scores look clean. Production tool calls look nothing like benchmarks. Here's what breaks.

9 min read

engineering

14 Apr 2026

Context window management for long-running agents

At some token budget, the model forgets. How you handle that boundary defines whether your agent degrades gracefully or fails hard.

6 min read

retrieval

7 Apr 2026

pgvector vs. dedicated vector DB: the honest comparison

For most production systems we've shipped, pgvector was the right call. Here's the reasoning and the caveats.

8 min read

practice

31 Mar 2026

Prompt versioning isn't optional in production

A prompt that worked in March may not work in May — model updates, data drift, or just entropy. Version your prompts like code.

5 min read

engineering

24 Mar 2026

Setting latency budgets for AI features before you build them

User tolerance for AI latency is not the same as API timeout limits. Set your p95 budget on day one.

4 min read

// subscribe

One note a fortnight. No spam.

Practical write-ups on agents, evals, retrieval, and product engineering. Straight to your inbox.