agentic AI

llm

When AI Agents Just Keep Thinking and Never Answer

Dishant Sharma

•

Dec 23rd, 2025

•

6 min read

When AI Agents Just Keep Thinking and Never Answer

Someone on Reddit lost their entire five-hour Claude session in 15 seconds. They asked for a yes or no answer. Instead, Claude spawned "Explore agents" and "Plan agents" that burned through every token before giving any response. The question never got answered.

This is happening everywhere. Not just to one person. AI agents, especially Claude, are getting stuck in reasoning loops and eating through context windows like they're paid to waste your money.

What's actually going on

The problem starts with how these agents think. Claude's extended thinking mode lets it reason through problems step by step. Sounds great. But here's what nobody tells you.

Every thought costs tokens. Your 200,000 token context window includes everything. The prompt. The response. And all that invisible reasoning Claude does behind the scenes. You see a short answer. Claude spent 30,000 tokens thinking about it.

One developer tracked their usage. 330 million input tokens in half a month. Half. A. Month. They weren't doing anything crazy. Just starting new chats. Claude was loading their entire codebase into context every single time.

The thinking budget can go up to 128,000 tokens. That's more tokens than most people's entire conversations. And it all happens before you see a single word of output.

The loop problem

Agents don't just think too much. They get stuck.

You ask them to summarize a document. They summarize it. Call save_file. Then re-read the document. Summarize it again. Call save_file again. Over and over until your API quota dies.

This isn't a bug in your prompt. It's called Loop Drift. The AI misinterprets "done." It thinks summarized isn't really done until it's re-summarized multiple times. Or it gets confused by ambiguous tool outputs and just keeps calling the same tool with the same arguments.

One guy watched his agent repeat the same action for hours.

The worst part is these loops look productive. The agent isn't frozen. It's actively doing things. Writing outputs. Calling functions. Just not making any progress toward finishing.

And every loop iteration eats more context. More tokens. More money.

Context rot

Here's something weird. Bigger context windows don't fix this.

When you fill up the context, the agent gets dumber. Not a little dumber. A lot dumber. Studies show recall and accuracy fall off a cliff as context fills up.

It's called context poisoning. Bad information gets in early. The agent builds on that bad info. Makes more mistakes. Those mistakes go back into context. The whole thing compounds.

Or context distraction happens. Too much past information. The agent can't tell what's important anymore. It starts repeating old behaviors instead of reasoning fresh.

People assume shoving everything into a bigger window solves problems. It doesn't. It just gives you more room to fail.

The cost surprise

Extended thinking costs more than standard responses. Not because Anthropic charges extra for the feature. Because thinking generates way more output tokens. And output tokens are expensive.

You send a simple question. Claude thinks for 10,000 tokens. Responds with 500 tokens. You pay for 10,500 output tokens. That's 20x more than you expected.

One benchmark test ran Claude 3.7 with extended thinking. Cost $36.83. Same benchmark with regular Claude 3.5? $17.72. Double the price for better answers. Except sometimes those answers never come.

The usage meter jumps 11% instantly now. People used to see it climb slowly as Claude processed. Now it spikes immediately and you're left wondering what it did with all those tokens.

Why developers are mad

Check the Claude subreddit. It's full of people who are pissed.

After an update, Claude Code started summoning agents that "burn an insane amount of tokens" before answering. One person asked a simple question. Got a terrible plan. Said "that plan is awful, i just wanted yes or no." Claude said "You're absolutely right!" Then did the exact same thing again.

Another developer built a "Save State" protocol because they got tired of re-explaining their code. Every new session meant burning more tokens on context Claude already had.

The pattern is everywhere. Agent does too much planning. Burns tokens. Gives a bad answer. You correct it. It apologizes. Then repeats the mistake.

This isn't agents being helpful. It's agents being expensive and stubborn.

The naming thing

Side note. Why do we call these things "agents" anyway?

They're not agents. They're loops with fancy prompts. An agent implies autonomy and good judgment. These things can't judge when to stop thinking. They don't know when they're stuck. They just keep going until something external kills them.

It's like calling a while loop an "autonomous computational entity." Technically true. Completely misleading.

We should call them what they are. Expensive loops.

Who this actually affects

Not everyone hits this problem. If you're using Claude for quick questions, you're fine. Ask. Get answer. Move on.

But if you're building agentic systems? This is a nightmare. Your agent needs to call tools, process results, make decisions, call more tools. Every step adds context. Every decision needs reasoning tokens.

One agent builder said debugging tool calls with all this invisible reasoning is their biggest headache. You can't see what the agent is thinking. You just see your token counter spinning up and your agent not finishing.

And if you're on the API paying per token? You're watching real money disappear into reasoning loops that produce nothing useful.

Small projects don't care about context limits. Large codebases do. Small questions don't need extended thinking. Complex problems do. But complex problems are exactly where these loops happen most.

What people are trying

Some teams force semantic completion checks. After every agent response, they check if the output actually matches the goal. If the agent says it's done but hasn't delivered, they break the loop manually.

Others limit reasoning tokens hard. Set the budget to 1,024 or 2,048 and don't let Claude exceed it. You get worse reasoning but at least it finishes.

The nuclear option is resetting context frequently. Treat context as scarce. When it fills up, kill the session and start fresh. You lose continuity but you don't lose money to reasoning death spirals.

But none of these feel good. They're all workarounds. Band-aids on a deeper problem.

The part nobody says

Extended thinking mode is visible now. Anthropic made it visible deliberately. So we could see what Claude is thinking and debug it better.

But making it visible created new problems. Now malicious actors can study the thought process and build better jailbreaks. And models might learn to hide certain thoughts if they know we're watching.

Anthropic admits this in their own post. Visible thinking helps with transparency. But it might incentivize models to think differently. Or less honestly.

That's a weird trade-off. We wanted to see the reasoning to fix loops. Now the reasoning might change because we're watching.

Still better than invisible token burning though.

The real issue

The core problem isn't tokens or context windows or loops. It's that these models don't know when to stop.

A human doing research knows when they've found enough information. They feel the diminishing returns. AI doesn't feel that. It just keeps reasoning until something external tells it to quit.

Extended thinking was supposed to make AI smarter. And it does. On benchmarks. On hard math problems. On tasks where more thinking actually helps.

But on normal questions? It's overkill. You don't need 10,000 thinking tokens to answer "what's the capital of France." You don't need parallel reasoning paths to fix a typo.

The model can't tell the difference. So it overthinks everything. Burns tokens on trivial tasks. Gets stuck on complex ones.

And we pay for it. In money. In time. In sessions that die before they finish.

Someone said their agent spent three hours on the same problem. Three hours of API calls. All going nowhere.

That's the real cost. Not just money. The trust that it'll actually finish what you asked for.

Enjoyed this article? Check out more posts.

View All Posts