agents

Spec-Driven-Development

How to Write AI Agent Specs That Don't Fail

Dishant Sharma

•

Jan 14th, 2026

•

6 min read

How to Write AI Agent Specs That Don't Fail

GitHub released an open-source toolkit called Spec Kit in September 2025. Within weeks, developers on Reddit were arguing about whether specs actually help or just create more work. The answer? Both, depending on how you write them.

Spec-driven AI agents aren't doing "vibe coding" anymore. They need clear instructions. Not vague prompts like "build me a task app." Real specifications that act like blueprints. The difference between an agent that works and one that hallucinates function names comes down to how well you spec it.

What spec-driven even means

Traditional coding with AI agents feels like throwing darts blindfolded. You describe something, get code back, and it looks right but doesn't compile. Or it compiles but misses your actual intent.

Spec-driven flips this. The specification becomes the source of truth. Not a document you write once and forget. A living artifact that drives implementation, testing, and task breakdowns.

The process has four phases. Specify (what you're building and why), Plan (technical stack and architecture), Tasks (broken down implementation pieces), and Implement (agent codes while you review focused changes).

You don't move to the next phase until the current one is validated.

GitHub analyzed over 2,500 agent configuration files. Found a pattern. The best specs cover six areas: commands, testing, project structure, code style, git workflow, and boundaries.

The context window problem

Addy Osmani wrote about this in December 2024. You can't just dump a massive spec at an agent and expect magic. Context limits exist. The model's "attention budget" breaks down.

i've seen developers write specs that rival RFCs. Thousands of words. Every edge case documented. Then they wonder why the agent ignores half the requirements.

Research confirms it. There's something called the "curse of instructions". Pile on ten detailed rules and the AI follows maybe three. Performance drops significantly as you add more simultaneous requirements.

Better strategy? Iterative focus. One task at a time.

Breaking work into chunks

JetBrains built this into their AI agent Junie. Instead of "build an entire feature in one go," you guide it through small, scoped patches. You stay in control by reviewing each change without losing perspective.

Developers on Reddit call this "narrowing the scope". If you need a blog post translated to Portuguese, use one LLM call for generation and another for translation. Simplifies testing and debugging.

One engineer accidentally ran an LLM call too many times with large inputs. Cost the client a fortune. Definitely a learning experience, they said.

Anthropic's guide recommends decomposing complex requirements into sequential, simple instructions. Not everything at once.

The three-tier boundary system

GitHub's research found something interesting. The most effective specs use three tiers for boundaries.

Always do: "Always run tests before commits" or "Always follow naming conventions".

Ask first: "Ask before modifying database schemas" or "Ask before adding new dependencies".

Never do: "Never commit secrets or API keys". This was the single most common constraint in the study.

Hard stops matter. One Reddit user said they make their agents.md "quite rigorous to prevent it from taking on random tasks, which it tends to do".

Why models forget

LLMs drift over long conversations. They're pattern completion engines, not mind readers. When you write "add photo sharing to my app," the model guesses at thousands of unstated requirements.

Some guesses are wrong. You won't discover which until deep into implementation.

This is why the spec becomes executable. It's not documentation that became more important. It's because AI makes specifications actually work.

The weird intern analogy

Simon Willison compared working with AI agents to "a very weird form of management". Getting good results feels uncomfortably close to managing a human intern.

You provide clear instructions. Ensure they have necessary context. Give actionable feedback.

He also said agents are "fiercely competent at Git". Which is true. Models can read diffs surprisingly well.

But they'll also cheat if you give them the chance. The spec prevents that cheating. Keeps them on task.

Testing makes or breaks it

A Reddit post from January 2026 argued the most underrated skill for building AI agents isn't prompting. It's error handling.

Advanced models like GPT-4 or Claude sometimes produce incorrect JSON. Omit necessary fields. Invent fictional function names. Many guides focus on ideal scenarios where everything works.

Reality? Developers spend more time managing "what if it fails" situations than building core logic.

Willison noted that having a robust test suite gives agents superpowers. They validate and iterate quickly when tests fail.

The random thing about naming

i still think about project names more than i should.

Whether you're building a finance app or a podcast tool, the name matters. Not for the spec. Not for the agent. For you.

Because you'll say that name hundreds of times while debugging. While explaining to someone why the agent rewrote your entire auth system when you asked it to "just add a logout button."

Names should be short. Easy to type. Not embarrassing to say out loud when your agent breaks production at 2am.

This has nothing to do with spec-driven development. But it matters anyway.

Most people don't need this

Spec-driven development is overkill for small projects. If you're building a simple landing page, just tell the agent what you want. Done.

This approach shines in three scenarios. Greenfield projects where you're starting from zero. Feature work in existing complex codebases. Legacy modernization where original intent is lost.

For everything else? Maybe not worth the overhead.

One developer on Reddit built a Twitter clone using specs. Took about ten minutes for the agent to generate specifications and plan automatically.

But that's with experience. First time? You'll spend hours writing specs only to realize the agent misunderstood anyway.

The learning curve is real. You're essentially learning a new form of technical writing. One that's precise enough for a literal-minded AI but flexible enough to evolve.

When vibe coding wins

Sometimes you just want to prototype fast. Try an idea. See if it works.

Specs slow that down. They force you to think ahead. Define boundaries. Plan architecture.

That's good for production systems. Bad for "i wonder if this is even possible" experiments.

Know when to skip the process. Not everything needs a formal specification.

Where this goes next

GitHub's team is exploring VS Code integrations. Ways to make spec-driven workflows feel natural inside the editor.

They're also looking at comparing and diffing multiple implementations. Iterating between versions opens creative possibilities.

The Agentic AI Foundation is standardizing protocols like MCP for tool integration. Specs that follow these patterns are easier for agents to consume reliably.

Context engineering is evolving too. RAG patterns let agents pull relevant spec chunks from vector databases on the fly instead of loading everything at once.

Multiple agents running in parallel is "the next big thing" according to Willison. Surprisingly effective, if mentally exhausting.

One agent codes while another tests. Or separate components get built concurrently. The trick is scoping tasks so agents don't conflict.

i tried running three agents once. Two worked fine. The third decided to refactor code the first agent just wrote. They fought each other for twenty minutes before i killed the process.

But when it works? Fast. Really fast.

Enjoyed this article? Check out more posts.

View All Posts