agentic AI

guardrails

AI Agents Keep Breaking. Here's How to Actually Control Them.

Dishant Sharma

•

Dec 18th, 2025

•

7 min read

AI Agents Keep Breaking. Here's How to Actually Control Them.

Everyone's building AI agents right now. Most of them are broken.

A developer on Reddit put it bluntly: "No control over input, no full control over output, but i'm still responsible". That's the problem. You build an agent that's supposed to handle customer queries. It works in testing. You deploy it. Then it starts hallucinating product features that don't exist. Or it gets stuck in a retry loop for three hours because you forgot to set a maximum attempt limit.

The promise was simple. Build autonomous agents that handle complex tasks. The reality is messier. Your agent decides which tool to call. Sometimes it picks the wrong one. Sometimes it keeps retrying the same failed action because nobody told it when to stop. And when something breaks, you're left staring at logs trying to figure out what decision the LLM made and why.

This is about guardrails. Not the corporate kind that sound good in slide decks. The actual technical controls that keep your agent from doing something stupid. And why even after multiple attempts, your output still goes sideways.

The Problem Nobody Mentions

AI agents aren't deterministic. Traditional software is. You call a function with the same input, you get the same output. Every time.

Agents don't work that way. Set the temperature to zero. Use the exact same prompt. You'll still get variations. Sometimes small. Sometimes big enough to break your entire workflow.

I've seen this play out. Developer builds an agent to process invoices. Works great for two weeks. Then one day it starts extracting the wrong fields. Same code. Same model. Different behavior.

The core issue is probabilistic outputs meeting deterministic systems. Your database expects a specific schema. Your API needs exact field names. But your LLM is playing probabilities, not following strict rules.

You can't fully predict what an LLM will do. But you can constrain what it's allowed to do.

Most people try to solve this with better prompts. They add examples. They write longer instructions. They beg the model to please follow the format this time.

That's not a guardrail. That's a suggestion.

What Actually Works

Real guardrails are enforcement mechanisms. Not requests.

Here's what people are doing that actually helps:

Validate output against a schema before it touches any real API
Set hard limits on retry attempts, usually three to five max
Use state machines to control which tools are available at which step
Add a reviewer agent that checks if output meets constraints

That last one is interesting. You have one agent generate a response. Another agent reviews it against your rules. If it fails, loop it back or reject it entirely.

Sounds inefficient. Sounds expensive. But it works better than hoping your prompt engineering is good enough.

The Schema Validation Approach

Define your output structure first. JSON schema. Handlebars template. Whatever format you need.

Then you make the LLM fill that template. Before the output goes anywhere, you validate it. Check types. Check required fields. Check business logic.

Only after validation passes do you let it execute real actions.

This doesn't eliminate randomness. But it contains the damage. The LLM can be creative within the template. It just can't break your downstream systems.

Why Retry Loops Are Evil

Uncontrolled retry loops are a special kind of hell.

Your agent calls a tool. The tool fails. The agent tries again. And again. And again. No limit. No backoff. Just endless repetition of the same mistake.

Logs show activity. Metrics show the agent is "working." But all it's doing is hammering a failing endpoint for hours.

The fix is simple. Set a maximum retry limit. Use exponential backoff. Add jitter so multiple agents don't retry at the exact same time.

But here's what most tutorials skip. You also need alternative paths. If tool A fails three times, maybe try tool B. Or escalate to a human. Or log the failure and move on.

Without those alternatives, your retry limit just changes the failure from infinite to finite. Still a failure.

State Machines Save Lives

State machines feel old school. But they're perfect for agents.

Define what states your agent can be in. Define which transitions are allowed. Define which tools are available in each state.

Now your agent can't randomly decide to call any tool at any time. It can only use tools that make sense for its current state.

Traditional frameworks dump all tools and rules into context at once. The LLM sees everything. Gets confused. Makes bad choices.

State machines give you control. In state A, only tools X and Y are available. The agent moves to state B only when specific conditions are met. In state B, different tools become available.

This is how Parlant works. Conditional transitions. Tools bound to guidelines. Tools only available when their guideline's condition activates.

The Supervision Layer

Multi-agent systems aren't just for scaling. They're for control.

One agent does the work. Another agent supervises. The supervisor checks every output against your rules before it ships.

Research showed this actually works. They tested it with prompts designed to cause hallucinations. Advanced models like GPT-4 and Llama3-70b caught hallucinations and revised outputs in 85% to 100% of cases.

The supervisor doesn't just look for bad words. It checks compliance with active guidelines. It verifies instructions were followed. If something's wrong, it sends it back for revision.

Supervision isn't about micromanagement. It's about accountability before the output reaches users.

Cost goes up. Latency increases. But your error rate drops. For production systems where trust matters, that trade-off works.

The Thing About Tool Selection

Agents fail at picking the right tool more than you'd think.

You give your agent ten tools. It needs to read descriptions, match capabilities to the task, and predict which one will work. Based on limited context.

When tools have overlapping functionality, accuracy drops. When descriptions are ambiguous, the agent guesses.

And with multi-step workflows, it gets worse. The agent has to maintain state across tool calls. Handle errors. Adapt the plan based on what happened three steps ago.

Production logs show tool call accuracy degrades as workflow complexity increases.

The fix isn't better tool descriptions. It's fewer tools per state. Use your state machine to limit what's available. Don't show the agent ten tools when only two are relevant to its current task.

When Guardrails Get Annoying

Real talk. Guardrails add overhead.

Designing them is complex. Maintaining them is worse. Every time your requirements change, you update your validation schemas. Your state definitions. Your supervision rules.

Integration is messy, especially with legacy systems. Real-time validation adds latency. Comprehensive guardrails increase compute and memory requirements.

For small projects, it might be overkill. If you're building a personal tool that generates blog ideas, you probably don't need a multi-agent supervision pipeline.

But if your agent touches production data, customer interactions, or financial transactions, skip the guardrails and you will regret it.

The cost of implementation is annoying. The cost of an agent hallucinating wrong information to a customer is career-ending.

The Random Thing About Naming

Most agent frameworks use technical names. ValidationAgent. SupervisorAgent. OrchestratorService.

Boring. Functional. Forgettable.

I've seen teams name their agents after characters. Rick and Morty. Office characters. Anything that makes code reviews less soul-crushing.

One team named their validator "Dwight" because it's overly strict and follows rules to an absurd degree. Their creative agent was "Michael" because it tries too hard and sometimes says inappropriate things.

Doesn't improve performance. But it makes the work less tedious. And when you're debugging retry loops at midnight, every bit of joy helps.

You Still Won't Get Perfect Control

Even with schemas, state machines, supervision layers, and retry limits, agents will surprise you.

The randomness is baked in. Temperature settings help. Lower means more predictable. But even at zero, you get variations.

Context overload happens. You add more rules to fix edge cases. Your prompt gets longer. The agent gets confused. Rule conflicts emerge. Behavior becomes less predictable, not more.

Maintenance never ends. New edge cases appear. Adversarial inputs you didn't test for. Your agent confidently handles 95% of cases. That last 5% keeps you up at night.

Most agents don't need to be perfect. They need to be safe. And when they're not sure, they need to escalate instead of guess.

Guardrails aren't about eliminating chaos. They're about containing it. Your agent can be creative and autonomous within boundaries. Just not outside them.

Where This Leaves You

Agents are powerful. Also unpredictable. Also your responsibility when they break.

The developers building reliable agent systems aren't using magic prompts. They're using boring engineering. Validation layers. State machines. Retry limits. Supervision.

Start small. Add one guardrail. Test it. See if output improves. Then add another.

You won't get deterministic behavior from a probabilistic model. But you can get predictable boundaries. And sometimes that's enough.

Enjoyed this article? Check out more posts.

View All Posts