// notes

Notes from the workshop floor.

Practical write-ups on agents, evals, retrieval, and product engineering. No thought leadership, no 10x developer posts.

agents
21 Apr 2026

Tool-calling in production: what the benchmarks don't show

Benchmark scores look clean. Production tool calls look nothing like benchmarks. Here's what breaks.

9 min read
engineering
14 Apr 2026

Context window management for long-running agents

At some token budget, the model forgets. How you handle that boundary defines whether your agent degrades gracefully or fails hard.

6 min read
retrieval
7 Apr 2026

pgvector vs. dedicated vector DB: the honest comparison

For most production systems we've shipped, pgvector was the right call. Here's the reasoning and the caveats.

8 min read
practice
31 Mar 2026

Prompt versioning isn't optional in production

A prompt that worked in March may not work in May — model updates, data drift, or just entropy. Version your prompts like code.

5 min read
engineering
24 Mar 2026

Setting latency budgets for AI features before you build them

User tolerance for AI latency is not the same as API timeout limits. Set your p95 budget on day one.

4 min read
// subscribe

One note a fortnight. No spam.

Practical write-ups on agents, evals, retrieval, and product engineering. Straight to your inbox.