Every estimate here is calibrated against real, documented builds — not vendor marketing.
These are the anchors: the optimized $1,100 framework rebuild, the $30,983 runaway month,
and the messy middle. The gap between them is the whole point.
Framework rebuild ~1 week Claude API (Opus 4.5/4.6) LOW anchor
94% of the Next.js API surface, rebuilt on Vite; ~800 sessions
The LOW regime. A comprehensive public test suite gave the agent an automated verify loop, and an expert operator kept it off unviable directions — the two things that separate a $1,100 build from a $30,000 one.
Web app (runaway) 1 month Claude Code, Codex, Synthetic HIGH anchor
Leaderboard web app; 51,414 API events, ~17B tokens in 30 days
The HIGH regime. Continuous sessions with no context resets let verbose terminal output stream back into the prompt, creating a geometric cost curve. The cautionary tale every High band is calibrated against.
Memo iOS app
~340M tokens (~$1,020) Mobile app 5 months Claude API EXPECTED anchor
Native memo app; core utilities + UI components
Context decay over a long build: even minor changes made the agent re-analyze the codebase, inflating cumulative spend. Anchors the mobile Expected band.
~9.7M input tokens / feature Academic benchmark per task Claude 4.5 Opus + others
End-to-end, feature-level coding tasks
SOTA models solved only ~11% of feature-level tasks despite high bug-fix scores. Proof that building new features burns tokens on retries — and the anchor for feature-count mode.
Prototype feature
~200k tokens (~$4.50) Isolated component 3 weeks Blended models
Single feature addition / isolated component build
The floor: a well-scoped, isolated change against a small surface costs almost nothing. Most of the spread above this is context size and retry loops, not the code itself.
Have a documented build to add? The plan is a public, growing library — and eventually a
leaderboard where you upload your own /usage export to see
where your burn rate lands.