How one can Write a Good Spec for AI Brokers – O’Reilly

This put up first appeared on Addy Osmani’s Elevate Substack e-newsletter and is being republished right here with the writer’s permission.

TL;DR: Goal for a transparent spec overlaying simply sufficient nuance (this may increasingly embody construction, model, testing, boundaries. . .) to information the AI with out overwhelming it. Break giant duties into smaller ones versus maintaining every little thing in a single giant immediate. Plan first in read-only mode, then execute and iterate repeatedly.

“I’ve heard lots about writing good specs for AI brokers, however haven’t discovered a stable framework but. I might write a spec that rivals an RFC, however in some unspecified time in the future the context is just too giant and the mannequin breaks down.”

Many builders share this frustration. Merely throwing an enormous spec at an AI agent doesn’t work—context window limits and the mannequin’s “consideration price range” get in the best way. The secret is to write down sensible specs: paperwork that information the agent clearly, keep inside sensible context sizes, and evolve with the challenge. This information distills greatest practices from my use of coding brokers together with Claude Code and Gemini CLI right into a framework for spec-writing that retains your AI brokers centered and productive.

We’ll cowl 5 ideas for nice AI agent specs, every beginning with a bolded takeaway.

1. Begin with a Excessive-Degree Imaginative and prescient and Let the AI Draft the Particulars

Kick off your challenge with a concise high-level spec, then have the AI develop it into an in depth plan.

As an alternative of overengineering upfront, start with a transparent objective assertion and some core necessities. Deal with this as a “product temporary” and let the agent generate a extra elaborate spec from it. This leverages the AI’s energy in elaboration when you keep management of the path. This works properly until you already really feel you may have very particular technical necessities that should be met from the beginning.

Why this works: LLM-based brokers excel at fleshing out particulars when given a stable high-level directive, however they want a transparent mission to keep away from drifting off track. By offering a brief define or goal description and asking the AI to provide a full specification (e.g., a spec.md), you create a persistent reference for the agent. Planning upfront issues much more with an agent: You’ll be able to iterate on the plan first, then hand it off to the agent to write down the code. The spec turns into the primary artifact you and the AI construct collectively.

Sensible strategy: Begin a brand new coding session by prompting

You might be an AI software program engineer. Draft an in depth specification for [project X] overlaying aims, options, constraints, and a step-by-step plan.

Preserve your preliminary immediate high-level: e.g., “Construct an internet app the place customers can
monitor duties (to-do record), with consumer accounts, a database, and a easy UI.”

The agent may reply with a structured draft spec: an summary, function record, tech stack ideas, information mannequin, and so forth. This spec then turns into the “supply of reality” that each you and the agent can refer again to. GitHub’s AI group promotes spec-driven improvement the place “specs develop into the shared supply of reality…residing, executable artifacts that evolve with the challenge.” Earlier than writing any code, assessment and refine the AI’s spec. Be sure it aligns along with your imaginative and prescient and proper any hallucinations or off-target particulars.

Use Plan Mode to implement planning-first: Instruments like Claude Code supply a Plan Mode that restricts the agent to read-only operations—it may possibly analyze your codebase and create detailed plans however received’t write any code till you’re prepared. That is best for the planning part: Begin in Plan Mode (Shift+Tab in Claude Code), describe what you wish to construct, and let the agent draft a spec whereas exploring your current code. Ask it to make clear ambiguities by questioning you concerning the plan. Have it assessment the plan for structure, greatest practices, safety dangers, and testing technique. The objective is to refine the plan till there’s no room for misinterpretation. Solely then do you exit Plan Mode and let the agent execute. This workflow prevents the widespread entice of leaping straight into code era earlier than the spec is stable.

Use the spec as context: As soon as permitted, save this spec (e.g., as SPEC.md) and feed related sections into the agent as wanted. Many builders utilizing a robust mannequin do precisely this. The spec file persists between periods, anchoring the AI at any time when work resumes on the challenge. This mitigates the forgetfulness that may occur when the dialog historical past will get too lengthy or when it’s important to restart an agent. It’s akin to how one would use a product necessities doc (PRD) in a group: a reference that everybody (human or AI) can seek the advice of to remain on monitor. Skilled people typically “write good documentation first and the mannequin could possibly construct the matching implementation from that enter alone” as one engineer noticed. The spec is that documentation.

Preserve it objective oriented: A high-level spec for an AI agent ought to concentrate on what and why greater than the nitty-gritty how (not less than initially). Consider it just like the consumer story and acceptance standards: Who’s the consumer? What do they want? What does success seem like? (For instance, “Consumer can add, edit, full duties; information is saved persistently; the app is responsive and safe.”) This retains the AI’s detailed spec grounded in consumer wants and consequence, not simply technical to-dos. Because the GitHub Spec Equipment docs put it, present a high-level description of what you’re constructing and why, and let the coding agent generate an in depth specification specializing in consumer expertise and success standards. Beginning with this big-picture imaginative and prescient prevents the agent from dropping sight of the forest for the timber when it later will get into coding.

2. Construction the Spec Like a Skilled PRD (or SRS)

Deal with your AI spec as a structured doc (PRD) with clear sections, not a unfastened pile of notes.

Many builders deal with specs for brokers very like conventional product requirement paperwork (PRDs) or system design docs: complete, well-organized, and simple for a “literal-minded” AI to parse. This formal strategy provides the agent a blueprint to observe and reduces ambiguity.

The six core areas

GitHub’s evaluation of over 2,500 agent configuration recordsdata revealed a transparent sample: The best specs cowl six areas. Use this as a guidelines for completeness:

Instructions: Put executable instructions early—not simply software names however full instructions with flags: npm check, pytest -v, npm run construct. The agent will reference these continuously.
Testing: How one can run assessments, what framework you employ, the place check recordsdata stay, and what protection expectations exist.
Mission construction: The place supply code lives, the place assessments go, the place docs belong. Be express: “src/ for utility code, assessments/ for unit assessments, docs/ for documentation.”
Code model: One actual code snippet displaying your model beats three paragraphs describing it. Embrace naming conventions, formatting guidelines, and examples of excellent output.
Git workflow: Department naming, commit message format, PR necessities. The agent can observe these in case you spell them out.
Boundaries: What the agent ought to by no means contact—secrets and techniques, vendor directories, manufacturing configs, particular folders. “By no means commit secrets and techniques” was the only most typical useful constraint within the GitHub research.

Be particular about your stack: Say “React 18 with TypeScript, Vite, and Tailwind CSS,” not “React challenge.” Embrace variations and key dependencies. Imprecise specs produce obscure code.

Use a constant format: Readability is king. Many devs use Markdown headings and even XML-like tags within the spec to delineate sections as a result of AI fashions deal with well-structured textual content higher than free-form prose. For instance, you may construction the spec as:

# Mission Spec: My group's duties app


## Goal
- Construct an internet app for small groups to handle duties...


## Tech Stack
- React 18+, TypeScript, Vite, Tailwind CSS
- Node.js/Specific backend, PostgreSQL, Prisma ORM


## Instructions
- Construct: `npm run construct` (compiles TypeScript, outputs to dist/)
- Check: `npm check` (runs Jest, should cross earlier than commits)
- Lint: `npm run lint --fix` (auto-fixes ESLint errors)


## Mission Construction
- `src/` – Software supply code
- `assessments/` – Unit and integration assessments
- `docs/` – Documentation


## Boundaries
- ✅ All the time: Run assessments earlier than commits, observe naming conventions
- ⚠️ Ask first: Database schema modifications, including dependencies
- 🚫 By no means: Commit secrets and techniques, edit node_modules/, modify CI config

This degree of group not solely helps you assume clearly but in addition helps the AI discover info. Anthropic engineers suggest organizing prompts into distinct sections (like , , , and so forth.) for precisely this cause: It provides the mannequin robust cues about which information is which. And keep in mind, “minimal doesn’t essentially imply quick”—don’t shrink back from element within the spec if it issues, however maintain it centered.

Combine specs into your toolchain: Deal with specs as “executable artifacts” tied to model management and CI/CD. The GitHub Spec Equipment makes use of a four-phase gated workflow that makes your specification the middle of your engineering course of. As an alternative of writing a spec and setting it apart, the spec drives the implementation, checklists, and process breakdowns. Your main position is to steer; the coding agent does the majority of the writing. Every part has a particular job, and also you don’t transfer to the following one till the present process is totally validated:

1. Specify: You present a high-level description of what you’re constructing and why, and the coding agent generates an in depth specification. This isn’t about technical stacks or app design—it’s about consumer journeys, experiences, and what success seems to be like. Who will use this? What drawback does it resolve? How will they work together with it? Consider it as mapping the consumer expertise you wish to create, and letting the coding agent flesh out the small print. This turns into a residing artifact that evolves as you be taught extra.

2. Plan: Now you get technical. You present your required stack, structure, and constraints, and the coding agent generates a complete technical plan. If your organization standardizes on sure applied sciences, that is the place you say so. Should you’re integrating with legacy programs or have compliance necessities, all of that goes right here. You’ll be able to ask for a number of plan variations to check approaches. Should you make inner docs obtainable, the agent can combine your architectural patterns immediately into the plan.

3. Duties: The coding agent takes the spec and plan and breaks them into precise work—small, reviewable chunks that every resolve a particular piece of the puzzle. Every process must be one thing you may implement and check in isolation, nearly like test-driven improvement in your AI agent. As an alternative of “construct authentication,” you get concrete duties like “create a consumer registration endpoint that validates electronic mail format.”

4. Implement: Your coding agent tackles duties one after the other (or in parallel). As an alternative of reviewing thousand-line code dumps, you assessment centered modifications that resolve particular issues. The agent is aware of what to construct (specification), construct it (plan), and what to work on (process). Crucially, your position is to confirm at every part: Does the spec seize what you need? Does the plan account for constraints? Are there edge circumstances the AI missed? The method builds in checkpoints so that you can critique, spot gaps, and course-correct earlier than transferring ahead.

This gated workflow prevents what Willison calls “home of playing cards code”: fragile AI outputs that collapse below scrutiny. Anthropic’s Expertise system provides an analogous sample, letting you outline reusable Markdown-based behaviors that brokers invoke. By embedding your spec in these workflows, you make sure the agent can’t proceed till the spec is validated, and modifications propagate routinely to process breakdowns and assessments.

Take into account brokers.md for specialised personas: For instruments like GitHub Copilot, you may create brokers.md recordsdata that outline specialised agent personas—a @docs-agent for technical writing, a @test-agent for QA, a @security-agent for code assessment. Every file acts as a centered spec for that persona’s conduct, instructions, and bounds. That is significantly helpful once you need completely different brokers for various duties fairly than one general-purpose assistant.

Design for agent expertise (AX): Simply as we design APIs for developer expertise (DX), think about designing specs for “agent expertise.” This implies clear, parseable codecs: OpenAPI schemas for any APIs the agent will devour, llms.txt recordsdata that summarize documentation for LLM consumption, and express sort definitions. The Agentic AI Basis (AAIF) is standardizing protocols like MCP (Mannequin Context Protocol) for software integration. Specs that observe these patterns are simpler for brokers to devour and act on reliably.

PRD versus SRS mindset: It helps to borrow from established documentation practices. For AI agent specs, you’ll typically mix these into one doc (as illustrated above), however overlaying each angles serves you properly. Writing it like a PRD ensures you embody user-centric context (“the why behind every function”) so the AI doesn’t optimize for the mistaken factor. Increasing it like an SRS ensures you nail down the specifics the AI might want to truly generate appropriate code (like what database or API to make use of). Builders have discovered that this further upfront effort pays off by drastically lowering miscommunications with the agent later.

Make the spec a “residing doc”: Don’t write it and overlook it. Replace the spec as you and the agent make choices or uncover new information. If the AI needed to change the info mannequin otherwise you determined to chop a function, mirror that within the spec so it stays the bottom reality. Consider it as version-controlled documentation. In spec-driven workflows, the spec drives implementation, assessments, and process breakdowns, and also you don’t transfer to coding till the spec is validated. This behavior retains the challenge coherent, particularly in case you or the agent step away and are available again later. Bear in mind, the spec isn’t only for the AI—it helps you because the developer keep oversight and make sure the AI’s work meets the actual necessities.

3. Break Duties into Modular Prompts and Context, Not One Huge Immediate

Divide and conquer: Give the AI one centered process at a time fairly than a monolithic immediate with every little thing without delay.

Skilled AI engineers have realized that making an attempt to stuff the whole challenge (all necessities, all code, all directions) right into a single immediate or agent message is a recipe for confusion. Not solely do you danger hitting token limits; you additionally danger the mannequin dropping focus as a result of “curse of directions”—too many directives inflicting it to observe none of them properly. The answer is to design your spec and workflow in a modular manner, tackling one piece at a time and pulling in solely the context wanted for that piece.

The curse of an excessive amount of context/directions: Analysis has confirmed what many devs anecdotally noticed: as you pile on extra directions or information into the immediate, the mannequin’s efficiency in adhering to every one drops considerably. One research dubbed this the “curse of directions”, displaying that even GPT-4 and Claude wrestle when requested to fulfill many necessities concurrently. In sensible phrases, in case you current 10 bullet factors of detailed guidelines, the AI may obey the primary few and begin overlooking others. The higher technique is iterative focus. Pointers from trade recommend decomposing complicated necessities into sequential, easy directions as a greatest follow. Focus the AI on one subproblem at a time, get that executed, then transfer on. This retains the standard excessive and errors manageable.

Divide the spec into phases or elements: In case your spec doc could be very lengthy or covers loads of floor, think about splitting it into components (both bodily separate recordsdata or clearly separate sections). For instance, you might need a bit for “backend API spec” and one other for “frontend UI spec.” You don’t must all the time feed the frontend spec to the AI when it’s engaged on the backend, and vice versa. Many devs utilizing multi-agent setups even create separate brokers or subprocesses for every half (e.g., one agent works on database/schema, one other on API logic, one other on frontend—every with the related slice of the spec). Even in case you use a single agent, you may emulate this by copying solely the related spec part into the immediate for that process. Keep away from context overload: Don’t combine authentication duties with database schema modifications in a single go, because the DigitalOcean AI information warns. Preserve every immediate tightly scoped to the present objective.

Prolonged TOC/summaries for big specs: One intelligent method is to have the agent construct an prolonged desk of contents with summaries for the spec. That is basically a “spec abstract” that condenses every part into just a few key factors or key phrases, and references the place particulars will be discovered. For instance, in case your full spec has a bit on safety necessities spanning 500 phrases, you might need the agent summarize it to: “Safety: Use HTTPS, defend API keys, implement enter validation (see full spec §4.2).” By making a hierarchical abstract within the planning part, you get a hen’s-eye view that may keep within the immediate, whereas the fantastic particulars stay offloaded until wanted. This prolonged TOC acts as an index: The agent can seek the advice of it and say, “Aha, there’s a safety part I ought to have a look at,” and you may then present that part on demand. It’s just like how a human developer skims a top level view after which flips to the related web page of a spec doc when engaged on a particular half.

To implement this, you may immediate the agent after writing the spec: “Summarize the spec above into a really concise define with every part’s key factors and a reference tag.” The outcome may be an inventory of sections with one or two sentence summaries. That abstract will be saved within the system or assistant message to information the agent’s focus with out consuming up too many tokens. This hierarchical summarization strategy is understood to assist LLMs keep long-term context by specializing in the high-level construction. The agent carries a “psychological map” of the spec.

Make the most of subagents or “abilities” for various spec components: One other superior strategy is utilizing a number of specialised brokers (what Anthropic calls subagents or what you may name “abilities”). Every subagent is configured for a particular space of experience and given the portion of the spec related to that space. For example, you might need a database designer subagent that solely is aware of concerning the information mannequin part of the spec, and an API coder subagent that is aware of the API endpoints spec. The principle agent (or an orchestrator) can route duties to the suitable subagent routinely.

The profit is every agent has a smaller context window to take care of and a extra centered position, which may increase accuracy and permit parallel work on impartial duties. Anthropic’s Claude Code helps this by letting you outline subagents with their very own system prompts and instruments. “Every subagent has a particular function and experience space, makes use of its personal context window separate from the principle dialog, and has a customized system immediate guiding its conduct,” as their docs describe. When a process comes up that matches a subagent’s area, Claude can delegate that process to it, with the subagent returning outcomes independently.

Parallel brokers for throughput: Working a number of brokers concurrently is rising as “the following massive factor” for developer productiveness. Fairly than ready for one agent to complete earlier than beginning one other process, you may spin up parallel brokers for non-overlapping work. Willison describes this as “embracing parallel coding brokers” and notes it’s “surprisingly efficient, if mentally exhausting.” The secret is scoping duties so brokers don’t step on one another: One agent codes a function whereas one other writes assessments, or separate elements get constructed concurrently. Orchestration frameworks like LangGraph or OpenAI Swarm will help coordinate these brokers, and shared reminiscence by way of vector databases (like Chroma) lets them entry widespread context with out redundant prompting.

Single versus multi-agent: When to make use of every

	Single agent parallel	Multi-agent
Strengths	Less complicated setup; decrease overhead; simpler to debug and observe	Increased throughput; handles complicated interdependencies; specialists per area
Challenges	Context overload on massive tasks; slower iteration; single level of failure	Coordination overhead; potential conflicts; wants shared reminiscence (e.g., vector DBs)
Greatest for	Remoted modules; small-to-medium tasks; early prototyping	Giant codebases; one codes + one assessments + one opinions; impartial options
Ideas	Use spec summaries; refresh context per process; begin recent periods typically	Restrict to 2–3 brokers initially; use MCP for software sharing; outline clear boundaries

In follow, utilizing subagents or skill-specific prompts may seem like: You keep a number of spec recordsdata (or immediate templates)—e.g., SPEC_backend.md, SPEC_frontend.md—and also you inform the AI, “For backend duties, confer with SPEC_backend; for frontend duties confer with SPEC_frontend.” Or in a software like Cursor/Claude, you truly spin up a subagent for every. That is actually extra complicated to arrange than a single-agent loop, but it surely mimics what human builders do: We mentally compartmentalize a big spec into related chunks. (You don’t maintain the entire 50-page spec in your head without delay; you recall the half you want for the duty at hand, and have a common sense of the general structure.) The problem, as famous, is managing interdependencies: The subagents should nonetheless coordinate. (The frontend must know the API contract from the backend spec, and so forth.) A central overview (or an “architect” agent) will help by referencing the subspecs and making certain consistency.

Focus every immediate on one process/part: Even with out fancy multi-agent setups, you may manually implement modularity. For instance, after the spec is written, your subsequent transfer may be: “Step 1: Implement the database schema.” You feed the agent the database part of the spec solely, plus any world constraints from the spec (like tech stack). The agent works on that. Then for Step 2, “Now implement the authentication function”, you present the auth part of the spec and possibly the related components of the schema if wanted. By refreshing the context for every main process, you make sure the mannequin isn’t carrying loads of stale or irrelevant info that might distract it. As one information suggests: “Begin recent: start new periods to clear context when switching between main options.” You’ll be able to all the time remind the agent of important world guidelines (from the spec’s constraints part) every time, however don’t shove the whole spec in if it’s not all wanted.

Use in-line directives and code TODOs: One other modularity trick is to make use of your code or spec as an lively a part of the dialog. For example, scaffold your code with // TODO feedback that describe what must be executed, and have the agent fill them one after the other. Every TODO basically acts as a mini-spec for a small process. This retains the AI laser centered (“implement this particular perform based on this spec snippet”), and you may iterate in a good loop. It’s just like giving the AI a guidelines merchandise to finish fairly than the entire guidelines without delay.

The underside line: Small, centered context beats one large immediate. This improves high quality and retains the AI from getting “overwhelmed” by an excessive amount of without delay. As one set of greatest practices sums up, present “One Process Focus” and “Related information solely” to the mannequin, and keep away from dumping every little thing in every single place. By structuring the work into modules—and utilizing methods like spec summaries or subspec brokers—you’ll navigate round context measurement limits and the AI’s short-term reminiscence cap. Bear in mind, a well-fed AI is sort of a well-fed perform: Give it solely the inputs it wants for the job at hand.

4. Construct in Self-Checks, Constraints, and Human Experience

Make your spec not only a to-do record for the agent but in addition a information for high quality management—and don’t be afraid to inject your individual experience.

A very good spec for an AI agent anticipates the place the AI may go mistaken and units up guardrails. It additionally takes benefit of what you recognize (area information, edge circumstances, “gotchas”) so the AI doesn’t function in a vacuum. Consider the spec as each coach and referee for the AI: It ought to encourage the appropriate strategy and name out fouls.

Use three-tier boundaries: GitHub’s evaluation of two,500+ agent recordsdata discovered that the simplest specs use a three-tier boundary system fairly than a easy record of don’ts. This offers the agent clearer steerage on when to proceed, when to pause, and when to cease:

✅ All the time do: Actions the agent ought to take with out asking. “All the time run assessments earlier than commits.” “All the time observe the naming conventions within the model information.” “All the time log errors to the monitoring service.”

⚠️ Ask first: Actions that require human approval. “Ask earlier than modifying database schemas.” “Ask earlier than including new dependencies.” “Ask earlier than altering CI/CD configuration.” This tier catches high-impact modifications that may be fantastic however warrant a human examine.

🚫 By no means do: Arduous stops. “By no means commit secrets and techniques or API keys.” “By no means edit node_modules/ or vendor/.” “By no means take away a failing check with out express approval.” “By no means commit secrets and techniques” was the only most typical useful constraint within the research.

This three-tier strategy is extra nuanced than a flat record of guidelines. It acknowledges that some actions are all the time secure, some want oversight, and a few are categorically off-limits. The agent can proceed confidently on “All the time” gadgets, flag “Ask first” gadgets for assessment, and hard-stop on “By no means” gadgets.

Encourage self-verification: One highly effective sample is to have the agent confirm its work in opposition to the spec routinely. In case your tooling permits, you may combine checks like unit assessments or linting that the AI can run after producing code. However even on the spec/immediate degree, you may instruct the AI to double-check (e.g., “After implementing, examine the outcome with the spec and make sure all necessities are met. Checklist any spec gadgets that aren’t addressed.”). This pushes the LLM to mirror on its output relative to the spec, catching omissions. It’s a type of self-audit constructed into the method.

For example, you may append to a immediate: “(After writing the perform, assessment the above necessities record and guarantee every is happy, marking any lacking ones).” The mannequin will then (ideally) output the code adopted by a brief guidelines indicating if it met every requirement. This reduces the possibility it forgets one thing earlier than you even run assessments. It’s not foolproof, but it surely helps.

LLM-as-a-Choose for subjective checks: For standards which might be laborious to check routinely—code model, readability, adherence to architectural patterns—think about using “LLM-as-a-Choose.” This implies having a second agent (or a separate immediate) assessment the primary agent’s output in opposition to your spec’s high quality tips. Anthropic and others have discovered this efficient for subjective analysis. You may immediate “Evaluate this code for adherence to our model information. Flag any violations.” The choose agent returns suggestions that both will get integrated or triggers a revision. This provides a layer of semantic analysis past syntax checks.

Conformance testing: Willison advocates constructing conformance suites—language-independent assessments (typically YAML based mostly) that any implementation should cross. These act as a contract: Should you’re constructing an API, the conformance suite specifies anticipated inputs/outputs, and the agent’s code should fulfill all circumstances. That is extra rigorous than advert hoc unit assessments as a result of it’s derived immediately from the spec and will be reused throughout implementations. Embrace conformance standards in your spec’s success part (e.g., “Should cross all circumstances in conformance/api-tests.yaml”).

Leverage testing within the spec: If doable, incorporate a check plan and even precise assessments in your spec and immediate stream. In conventional improvement, we use TDD or write check circumstances to make clear necessities—you are able to do the identical with AI. For instance, within the spec’s success standards, you may say, “These pattern inputs ought to produce these outputs…” or “The next unit assessments ought to cross.” The agent will be prompted to run by way of these circumstances in its head or truly execute them if it has that functionality. Willison famous that having a sturdy check suite is like giving the brokers superpowers: They’ll validate and iterate rapidly when assessments fail. In an AI coding context, writing a little bit of pseudocode for assessments or anticipated outcomes within the spec can information the agent’s implementation. Moreover, you should utilize a devoted “check agent” in a subagent setup that takes the spec’s standards and repeatedly verifies the “code agent’s” output.

Deliver your area information: Your spec ought to mirror insights that solely an skilled developer or somebody with context would know. For instance, in case you’re constructing an ecommerce agent and you recognize that “merchandise” and “classes” have a many-to-many relationship, state that clearly. (Don’t assume the AI will infer it—it won’t.) If a sure library is notoriously tough, point out pitfalls to keep away from. Basically, pour your mentorship into the spec. The spec can include recommendation like “If utilizing library X, be careful for reminiscence leak concern in model Y (apply workaround Z).” This degree of element is what turns a median AI output into a very sturdy answer, since you’ve steered the AI away from widespread traps.

Additionally, when you have preferences or model tips (say, “use practical elements over class elements in React”), encode that within the spec. The AI will then emulate your model. Many engineers even embody small examples within the spec (as an illustration, “All API responses must be JSON, e.g., {“error”: “message”} for errors.”). By giving a fast instance, you anchor the AI to the precise format you need.

Minimalism for easy duties: Whereas we advocate thorough specs, a part of experience is realizing when to maintain it easy. For comparatively easy, remoted duties, an overbearing spec can truly confuse greater than assist. Should you’re asking the agent to do one thing simple (like “middle a div on the web page”), you may simply say, “Be sure to maintain the answer concise and don’t add extraneous markup or kinds.” No want for a full PRD there. Conversely, for complicated duties (like “implement an OAuth stream with token refresh and error dealing with”), that’s once you get away the detailed spec. A very good rule of thumb: Modify spec element to process complexity. Don’t underspec a tough drawback (the agent will flail or go off-track), however don’t overspec a trivial one (the agent may get tangled or dissipate context on pointless directions).

Preserve the AI’s “persona” if wanted: Generally, a part of your spec is defining how the agent ought to behave or reply, particularly if the agent interacts with customers. For instance, if constructing a buyer help agent, your spec may embody tips like “Use a pleasant {and professional} tone” and “Should you don’t know the reply, ask for clarification or supply to observe up fairly than guessing.” These sorts of guidelines (typically included in system prompts) assist maintain the AI’s outputs aligned with expectations. They’re basically spec gadgets for AI conduct. Preserve them constant and remind the mannequin of them if wanted in lengthy periods. (LLMs can “drift” in model over time if not saved on a leash.)

You stay the exec within the loop: The spec empowers the agent, however you stay the last word high quality filter. If the agent produces one thing that technically meets the spec however doesn’t really feel proper, belief your judgement. Both refine the spec or immediately modify the output. The wonderful thing about AI brokers is that they don’t get offended—in the event that they ship a design that’s off, you may say, “Really, that’s not what I supposed, let’s make clear the spec and redo it.” The spec is a residing artifact in collaboration with the AI, not a one-time contract you may’t change.

Simon Willison humorously likened working with AI brokers to “a really bizarre type of administration” and even “getting good outcomes out of a coding agent feels uncomfortably near managing a human intern.” It’s essential to present clear directions (the spec), guarantee they’ve the required context (the spec and related information), and provides actionable suggestions. The spec units the stage, however monitoring and suggestions throughout execution are key. If an AI was a “bizarre digital intern who will completely cheat in case you give them an opportunity,” the spec and constraints you write are the way you stop that dishonest and maintain them on process.

Right here’s the payoff: A very good spec doesn’t simply inform the AI what to construct; it additionally helps it self-correct and keep inside secure boundaries. By baking in verification steps, constraints, and your hard-earned information, you drastically enhance the chances that the agent’s output is appropriate on the primary attempt (or not less than a lot nearer to appropriate). This reduces iterations and people “Why on Earth did it try this?” moments.

5. Check, Iterate, and Evolve the Spec (and Use the Proper Instruments)

Consider spec writing and agent constructing as an iterative loop: check early, collect suggestions, refine the spec, and leverage instruments to automate checks.

The preliminary spec shouldn’t be the top—it’s the start of a cycle. The most effective outcomes come once you frequently confirm the agent’s work in opposition to the spec and modify accordingly. Additionally, fashionable AI devs use numerous instruments to help this course of (from CI pipelines to context administration utilities).

Steady testing: Don’t wait till the top to see if the agent met the spec. After every main milestone and even every perform, run assessments or not less than do fast guide checks. If one thing fails, replace the spec or immediate earlier than continuing. For instance, if the spec mentioned, “Passwords should be hashed with bcrypt” and also you see the agent’s code storing plain textual content, cease and proper it (and remind the spec or immediate concerning the rule). Automated assessments shine right here: Should you offered assessments (or write them as you go), let the agent run them. In lots of coding agent setups, you may have an agent run npm check or comparable after ending a process. The outcomes (failures) can then feed again into the following immediate, successfully telling the agent “Your output didn’t meet spec on X, Y, Z—repair it.” This type of agentic loop (code > check > repair > repeat) is extraordinarily highly effective and is how instruments like Claude Code or Copilot Labs are evolving to deal with bigger duties. All the time outline what “executed” means (by way of assessments or standards) and examine for it.

Iterate on the spec itself: Should you uncover that the spec was incomplete or unclear (possibly the agent misunderstood one thing otherwise you realized you missed a requirement), replace the spec doc. Then explicitly resync the agent with the brand new spec: “I’ve up to date the spec as follows… Given the up to date spec, modify the plan or refactor the code accordingly.” This manner the spec stays the only supply of reality. It’s just like how we deal with altering necessities in regular dev, however on this case you’re additionally the product supervisor in your AI agent. Preserve model historical past if doable (even simply by way of commit messages or notes), so you recognize what modified and why.

Make the most of context administration and reminiscence instruments: There’s a rising ecosystem of instruments to assist handle AI agent context and information. For example, retrieval-augmented era (RAG) is a sample the place the agent can pull in related chunks of information from a information base (like a vector database) on the fly. In case your spec is large, you can embed sections of it and let the agent retrieve probably the most related components when wanted, as an alternative of all the time offering the entire thing. There are additionally frameworks implementing the Mannequin Context Protocol (MCP), which automates feeding the appropriate context to the mannequin based mostly on the present process. One instance is Context7 (context7.com), which may auto-fetch related context snippets from docs based mostly on what you’re engaged on. In follow, this may imply the agent notices you’re engaged on “cost processing” and it pulls the funds part of your spec or documentation into the immediate. Take into account leveraging such instruments or establishing a rudimentary model (even a easy search in your spec doc).

Parallelize rigorously: Some builders run a number of agent cases in parallel on completely different duties (as talked about earlier with subagents). This will velocity up improvement (e.g., one agent generates code whereas one other concurrently writes assessments, or two options are constructed concurrently). Should you go this route, make sure the duties are actually impartial or clearly separated to keep away from conflicts. (The spec ought to observe any dependencies.) For instance, don’t have two brokers writing to the identical file without delay. One workflow is to have an agent generate code and one other assessment it in parallel, or to have separate elements constructed that combine later. That is superior utilization and will be mentally taxing to handle. (As Willison admitted, working a number of brokers is surprisingly efficient, if mentally exhausting!) Begin with at most 2–3 brokers to maintain issues manageable.

Model management and spec locks: Use Git or your model management of selection to trace what the agent does. Good model management habits matter much more with AI help. Commit the spec file itself to the repo. This not solely preserves historical past, however the agent may even use git diff or blame to know modifications. (LLMs are fairly able to studying diffs.) Some superior agent setups let the agent question the VCS historical past to see when one thing was launched—surprisingly, fashions will be “fiercely competent at Git.” By maintaining your spec within the repo, you enable each you and the AI to trace evolution. There are instruments (like GitHub Spec Equipment talked about earlier) that combine spec-driven improvement into the Git workflow—as an illustration, gating merges on up to date specs or producing checklists from spec gadgets. Whilst you don’t want these instruments to succeed, the takeaway is to deal with the spec like code: Preserve it diligently.

Value and velocity issues: Working with giant fashions and lengthy contexts will be sluggish and costly. A sensible tip is to make use of mannequin choice and batching neatly. Maybe use a less expensive/quicker mannequin for preliminary drafts or repetitions, and reserve probably the most succesful (and costly) mannequin for last outputs or complicated reasoning. Some builders use GPT-4 or Claude for planning and important steps, however offload easier expansions or refactors to an area mannequin or a smaller API mannequin. If utilizing a number of brokers, possibly not all should be high tier; a test-running agent or a linter agent might be a smaller mannequin. Additionally think about throttling context measurement: Don’t feed 20K tokens if 5K will do. As we mentioned, extra tokens can imply diminishing returns.

Monitor and log every little thing: In complicated agent workflows, logging the agent’s actions and outputs is crucial. Verify the logs to see if the agent is deviating or encountering errors. Many frameworks present hint logs or enable printing the agent’s chain of thought (particularly in case you immediate it to assume step-by-step). Reviewing these logs can spotlight the place the spec or directions might need been misinterpreted. It’s not not like debugging a program—besides the “program” is the dialog/immediate chain. If one thing bizarre occurs, return to the spec/directions to see if there was ambiguity.

Be taught and enhance: Lastly, deal with every challenge as a studying alternative to refine your spec-writing talent. Possibly you’ll uncover {that a} sure phrasing constantly confuses the AI, or that organizing spec sections in a sure manner yields higher adherence. Incorporate these classes into the following spec. The sector of AI brokers is quickly evolving, so new greatest practices (and instruments) emerge continuously. Keep up to date by way of blogs (like those by Simon Willison, Andrej Karpathy, and so forth.), and don’t hesitate to experiment.

A spec for an AI agent isn’t “write as soon as, executed.” It’s a part of a steady cycle of instructing, verifying, and refining. The payoff for this diligence is substantial: By catching points early and maintaining the agent aligned, you keep away from pricey rewrites or failures later. As one AI engineer quipped, utilizing these practices can really feel like having “a military of interns” working for you, however it’s important to handle them properly. A very good spec, repeatedly maintained, is your administration software.

Keep away from Widespread Pitfalls

Earlier than wrapping up, it’s price calling out antipatterns that may derail even well-intentioned spec-driven workflows. The GitHub research of two,500+ agent recordsdata revealed a stark divide: “Most agent recordsdata fail as a result of they’re too obscure.” Listed here are the errors to keep away from:

Imprecise prompts: “Construct me one thing cool” or “Make it work higher” provides the agent nothing to anchor on. As Baptiste Studer places it: “Imprecise prompts imply mistaken outcomes.” Be particular about inputs, outputs, and constraints. “You’re a useful coding assistant” doesn’t work. “You’re a check engineer who writes assessments for React elements, follows these examples, and by no means modifies supply code” does.

Overlong contexts with out summarization: Dumping 50 pages of documentation right into a immediate and hoping the mannequin figures it out hardly ever works. Use hierarchical summaries (as mentioned in precept 3) or RAG to floor solely what’s related. Context size shouldn’t be an alternative to context high quality.

Skipping human assessment: Willison has a private rule—“I received’t commit code I couldn’t clarify to another person.” Simply because the agent produced one thing that passes assessments doesn’t imply it’s appropriate, safe, or maintainable. All the time assessment important code paths. The “home of playing cards” metaphor applies: AI-generated code can look stable however collapse below edge circumstances you didn’t check.

Conflating vibe coding with manufacturing engineering: Speedy prototyping with AI (“vibe coding”) is nice for exploration and throwaway tasks. However delivery that code to manufacturing with out rigorous specs, assessments, and assessment is asking for hassle. I distinguish “vibe coding” from “AI-assisted engineering”—the latter requires the self-discipline this information describes. Know which mode you’re in.

Ignoring the “deadly trifecta”: Willison warns of three properties that make AI brokers harmful: velocity (they work quicker than you may assessment), nondeterminism (identical enter, completely different outputs), and value (encouraging nook reducing on verification). Your spec and assessment course of should account for all three. Don’t let velocity outpace your means to confirm.

Lacking the six core areas: In case your spec doesn’t cowl instructions, testing, challenge construction, code model, git workflow, and bounds, you’re doubtless lacking one thing the agent wants. Use the six-area guidelines from part 2 as a sanity examine earlier than handing off to the agent.

Conclusion

Writing an efficient spec for AI coding brokers requires stable software program engineering ideas mixed with adaptation to LLM quirks. Begin with readability of function and let the AI assist develop the plan. Construction the spec like a critical design doc, overlaying the six core areas and integrating it into your toolchain so it turns into an executable artifact, not simply prose. Preserve the agent’s focus tight by feeding it one piece of the puzzle at a time (and think about intelligent ways like abstract TOCs, subagents, or parallel orchestration to deal with massive specs). Anticipate pitfalls by together with three-tier boundaries (all the time/ask first/by no means), self-checks, and conformance assessments—basically, educate the AI not fail. And deal with the entire course of as iterative: use assessments and suggestions to refine each the spec and the code repeatedly.

Observe these tips and your AI agent shall be far much less more likely to “break down” below giant contexts or get lost into nonsense.

Completely satisfied spec-writing!

On March 26, be part of Addy and Tim O’Reilly at AI Codecon: Software program Craftsmanship within the Age of AI, the place an all-star lineup of consultants will go deeper into orchestration, agent coordination, and the brand new abilities builders must construct wonderful software program that creates worth for all members. Join free right here.

Main Menu

What's Hot

The Historical past of Eugenics and Up to date Debates over Human Enhancement

7 AI coding methods I exploit to ship actual, dependable merchandise – quick

How one can Write a Good Spec for AI Brokers – O’Reilly

How one can Write a Good Spec for AI Brokers – O’Reilly

The Approach We Discover, That’s What Actually Issues: Instantiating UI Elements with Distinguishing Variations

Reinforcement fine-tuning for Amazon Nova: Educating AI by means of suggestions

Docker AI for Agent Builders: Fashions, Instruments, and Cloud Offload

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

The Historical past of Eugenics and Up to date Debates over Human Enhancement

7 AI coding methods I exploit to ship actual, dependable merchandise – quick

How one can Write a Good Spec for AI Brokers – O’Reilly

Robotic Speak Episode 146 – Embodied AI on the ISS, with Jamie Palmer

Main Menu

Subscribe to Updates

What's Hot

How one can Write a Good Spec for AI Brokers – O’Reilly

1. Begin with a Excessive-Degree Imaginative and prescient and Let the AI Draft the Particulars

2. Construction the Spec Like a Skilled PRD (or SRS)

The six core areas

3. Break Duties into Modular Prompts and Context, Not One Huge Immediate

Single versus multi-agent: When to make use of every

4. Construct in Self-Checks, Constraints, and Human Experience

5. Check, Iterate, and Evolve the Spec (and Use the Proper Instruments)

Keep away from Widespread Pitfalls

Conclusion

Related Posts