From AI demo to product: where the real work is
2026-02-12 / 3 min / ai / production / consulting / founders
Many AI projects never ship. In my experience, the reason is rarely the model. A short note on the engineering work that lives between a working demo and a feature in real customers' hands.
Many AI projects never make it to production. In the ones I have worked on, the reason is rarely the model.
I have spent the last several years inside teams building AI features, sometimes as a full-time engineer and lately as an outside consultant. The pattern shows up often. A demo works on a Monday. By Friday, someone has shown it to a few people and a slide deck exists. Three months later the project is "still in progress." Six months later the team has moved on to something else.
The model itself is usually fine. The work that kills these projects lives somewhere else.
What "a working demo" actually means
A demo, in the way most teams use the word, means the AI feature produced a reasonable output one time on a hand-picked input, in a controlled environment, with someone watching. The output looked impressive. A screenshot was probably taken.
Real users do not behave like demo audiences. They paste in weird inputs. They ask follow-up questions that the demo never tested. They use the feature at 11pm on a Sunday when nobody is watching. They expect it to work the same way every time.
The gap between "looked good in a meeting" and "consistently useful in real hands" is where much of the engineering work lives. Teams often underestimate this gap by an order of magnitude.
What actually has to happen
A short list of the things that have to be built before an AI feature stops being a demo:
- Evaluation - you need a way to measure whether the AI is getting better or worse with each change. Without an eval harness, every prompt tweak is a guess and every regression is invisible until a user complains.
- Failure handling - what happens when the AI returns nothing, returns the wrong thing, or takes too long. Real systems have to keep working when the model misbehaves.
- Cost control - AI calls cost money per request. A feature that costs five cents per user is fine. A feature that costs five dollars per user is a business problem. Most teams discover the difference too late. Routing across providers is a common fix once they do, and 20 to 30% savings is the usual band.
- Data plumbing - the AI needs access to the right context. Whatever search, retrieval, or memory it relies on is its own engineering project, often bigger than the AI itself.
- Production guardrails - rate limits, prompt injection defense, output validation, logging that survives a postmortem. These exist on every production system. Skipping them on AI features is how compliance teams find out you shipped one.
Each of these is non-trivial. Each one can soak up weeks of engineering time. None of them are visible in a demo.
What this means for your timeline
If you have a working demo and you are budgeting two weeks to "make it production-ready", you are budgeting for the visible work and ignoring the invisible work. The invisible work can be three to five times bigger than what you can see.
The timeline you have in your head may be wrong by three to five times. That is fine if you know it. It is a serious problem if you have already committed to a launch date.
What I tell founders
If you have an AI demo and you want to ship it, the right first questions are not about the model. They are about evaluation, failure modes, data, cost, and guardrails. If you cannot answer those five on a whiteboard in 30 minutes, the demo is not ready to become a product yet.
This is most of what I do as an independent engineer. Teams have a working demo. They have a deadline. They need someone who has built this kind of thing before to move it across the gap.
If this sounds like where you are right now, send a brief.
Read next
- AI cost per user: how to model it before you ship
Founders ask 'how much will the AI cost?' and quote API prices. The API price is the least useful number in that conversation. Here's a practical model for cost per active user per month: the six terms that matter, and the five levers that actually move them.
- Why AI keeps fixing your app into new bugs
When your AI coding tool gets stuck, another prompt often makes the app worse. Break the debugging loop with reproduction, evidence, small diffs, and tests.
