Selected writeups
Anonymised notes from real engagements. Each one is a system I actually shipped, not a think-piece. Numbers come from real clients. These get technical. If you want the plain-English version, the home page covers what I do.
012026-04-08 / 4 min / ai / llm / underwriting / fintech / production
Not a model demo. A workflow tool the credit team actually opened every morning. Built in 10 weeks, took manual review off the top decile of cases, and saved roughly five minutes of handling time per accepted draft against the pre-launch six-minute baseline. Here is how it shipped without an LLM-replaces-humans pitch.
022025-09-12 / 2 min / llm / infra / cost / latency
An orchestration layer that picks the right provider per request. 28% lower provider/API spend against the prior single-provider baseline, normalised for request volume and token mix. p95 latency stayed sub-second. Caller code never changed.
032025-05-22 / 2 min / llm / evals / rag / production
Retrieval and prompt evaluation pipelines that drove an 18% relative lift in rubric pass rate over the prior eval harness, measured on production-derived canary sets. Plus why most eval setups silently lie to you.
042024-02-18 / 3 min / payments / fraud / ml / production
Fraudulent transactions fell 70% and manual review load fell 55% relative to the pre-model rules-and-review baseline, normalised for volume over the post-rollout measurement window. What worked, what the model could not solve on its own, and the three pieces we built before the model went live.
052023-09-04 / 3 min / payments / reliability / postgres / production
What 99.999% actually means at 20M transactions a month. The Postgres patterns, the idempotency surface, and the operational tax that nobody talks about until they have already missed an SLA.