Your AI Agent Bill Is Not the Prompt. It Is the Retry Loop.
Almost everyone budgets AI the same wrong way. They look at the price of one call - a few cents, maybe a fraction of one - multiply by how many times they expect to run it, and conclude the whole thing is cheap. Then the monthly bill arrives an order of magnitude higher than the estimate, and nobody can point to where it went. The math was not wrong. The model of how an agent spends money was wrong.
Here is the part that does not fit on a pricing page: a single agent task is not one call. An agent that uses tools - reading a file, calling an API, checking its own work - runs a loop. Think, act, read the result, think again. And the expensive, invisible detail is what happens to the context on every turn of that loop.
Why the loop multiplies, and the prompt does not
When an agent takes a step, the model does not remember the previous steps for free. The entire conversation so far - the original instructions, every tool call, every result it read back - gets sent again as input on the next step. Step two re-pays for step one. Step three re-pays for steps one and two. The context grows, and you are billed for the whole accumulated pile, every single turn.
So a task you priced as "one call" can quietly become fifteen or twenty calls, each one larger than the last. The first prompt is the cheapest moment in the entire run. The cost lives in the tail - and the tail is exactly the part nobody estimated.
The contrarian version, stated plainly: optimizing your prompt to be shorter is the lowest-leverage cost move there is. The prompt is a one-time entry fee. The bill is set by how many times the agent loops and how much context it drags through each loop. You are tuning the cheap thing.
Where the real money actually leaks
- Retries on failure. A tool errors, the agent does not understand why, and it tries again - with the failed attempt now added to the context it re-pays for. A flaky integration is not a reliability problem with a cost footnote. It is a cost problem.
- The loop that does not know it is stuck. An agent that cannot make progress will often keep going anyway, re-reading the same growing context and producing slight variations of the same step. Nothing crashes. The meter just runs.
- Re-reading large outputs. One tool returns a big blob - a full file, a long API response - and that blob now rides along in the context for every remaining step of the task, paid for again and again.
The one control that actually changes the bill
The fix is not a better prompt and it is not a cheaper model. It is a ceiling on the run, set before the run starts. Three things, in order of impact:
1. Cap the loop. Decide the maximum number of steps a task is allowed to take, and stop it there. An agent that has not finished a normal task in N steps is not about to finish it on step N+1 - it is stuck, and every further step is pure waste. The cap converts a runaway into a bounded, knowable cost.
2. Estimate before you execute. The most useful number is the one you get before the money is spent, not after. Knowing the likely cost of a run lets you refuse the expensive ones up front, instead of discovering them on the invoice. Spend recorded after the fact is an autopsy. A pre-run estimate is a decision.
3. Stop re-paying for the same context. Trim what rides along between steps. Large tool outputs do not all need to stay in the context for the rest of the task - summarize or drop them once they are used, so step twelve is not still paying for the giant blob step three read.
The order matters. Most teams reach for the model price or the prompt length because those are the numbers printed on the page. The leverage is in the shape of the loop, which no pricing page shows you - and which is exactly why the bill keeps surprising people who only ever looked at the price of one call.
If your AI feature works in the demo and scares you on the invoice, email me at kirill@launchsoloai.com with three things: what it does, roughly what it costs now, and where it runs. Within 24 hours you get back a free written teardown - where the spend actually goes and what I would cap first - or a straight no. You can also see my productized offers and pricing here.
← All insights