Executive Summary
This article is the third in a series arguing that software development is shifting from code-first to intent-first delivery. Specifications are the new source code and they need the same engineering discipline. This article covers what an executable specification actually contains, the roles and rituals that make spec authoring sustainable, the tooling stack teams use today, the failure modes that reliably trip up early adopters, and the dynamics of running this practice in embedded-team setups where authoring authority crosses an organizational line.
A team I worked with shipped a feature meant to flag risky transactions for compliance review. The spec said something close to that: “flag risky transactions and route them to the compliance queue.” The engineering team reached for the fraud-scoring model already in production. The stakeholders meant something different; they meant that any transaction touching a sanctioned counterparty, regardless of fraud score, should land in the queue. The system worked exactly as specified and was useless for the actual problem.
The cost was three engineer-weeks of rework, a missed compliance deadline, and a meeting where everyone agreed that the word “risky” had been doing far too much work in that sentence. Nobody had been careless. It was the kind of spec we have all written; the kind that, in a pre-AI world, would have produced a partial implementation, a clarifying conversation, and a course correction within a sprint. In an intent-first world, it produced a full implementation pointed at the wrong target, at machine speed.
That has always been a problem. When an AI agent generates ten thousand lines of code from an ambiguous specification, it becomes the problem. This piece is about what fixing it actually looks like in practice, drawing on patterns I have seen consistently across engagements, with details composited and anonymized to protect specific clients.
What is an Executable Specification?
An executable specification is a structured requirements artifact that defines what a system should do, the rules it must follow, and how success will be verified. It is precise enough for a reasoning system to validate, complete enough for a generator to build from, and explicit enough for a test harness to verify against.
Most teams, when asked to produce a specification, produce a description.A description tells a reader what the system should do. It is human-readable and machine-useless. That gap has always existed. In a world where AI agents generate thousands of lines of code from written intent, it produces complete implementations of the wrong behavior, at full speed.
The Five Components of an Executable Specification
In practice, a working specification has five components.
The Intent Statement
This is a short, prose-level description of what the feature is for, who it serves, and why it exists. This is the only part that looks like traditional product writing. It exists to give downstream readers, human and machine, the framing they need to interpret everything else. Two or three sentences is usually enough. If it runs longer, the feature is probably more than one feature.
The Domain Model
It refers to the entities, relationships, and invariants the feature operates on. A supplier scorecard feature, for example, needs an explicit definition of what a supplier is, what a score is, what dimensions roll up into a score, and what relationships connect them. If the team cannot agree on the domain model, no amount of code review later will save the feature. This is the layer where most ambiguity actually lives, and the layer most often skipped.
The Behavioral Rules
These are the conditions under which the system must do something, expressed as machine-checkable statements rather than prose. “When a supplier’s on-time delivery rate falls below 85 percent for two consecutive quarters, their score is downgraded by one tier” is a behavioral rule. “The system should handle poor-performing suppliers appropriately” is not. The rule of thumb is that If you cannot write a test for it, it is not a behavioral rule yet.
The Constraints and Policies
These are the non-functional boundaries the system must respect, like performance budgets, privacy policies, audit requirements, fairness constraints, or regulatory mandates. These are often the most consequential part of the spec and the most commonly omitted, because they are invisible until they fail. Encoding them upfront, as policy-as-code rather than a paragraph in a compliance document, is how you avoid discovering compliance gaps through a regulator’s question rather than a spec review.
The Acceptance Evidence
This is the proof needed for the feature to be considered done. It includes property tests for invariants, scenario tests for behavioral rules, telemetry signals for runtime properties, and audit trails for compliance constraints. This is part of the specification itself. A specification without acceptance evidence is a wish list.
Together these five elements describe a feature precisely enough to generate from, completely enough to validate against, and concisely enough to review meaningfully. They also expose, with some brutality, the places where the team has not yet decided what they want. That exposure is the point.
The structure below is composited from scoring features I have worked on across more than one engagement; the details are abstracted, but the form is the form that has actually produced clean delivery.

What changes in delivery is consistent enough to be worth naming. Scope debates that previously surfaced in week three of the build, when an engineer hit an edge case the spec hadn’t addressed, instead surface in spec review, where they cost an hour and not a sprint. Compliance review runs in parallel because the constraints are already encoded. Conversations about what the score actually means happen before code is written rather than during the demo.
Who Should Write Executable Specifications?
The hardest part of intent-first delivery is organizational. Specifications of the kind described above cannot be written by a product manager alone, or by an engineer alone, or by a domain expert alone. They have to be co-authored, and the rituals around that co-authoring determine whether the practice scales.
The pattern that consistently works has three roles in the room from the start.
The Product Owner
This figure owns the intent statement and the prioritization of behavioral rules. They carry the burden of saying what the feature is for and what trade-offs are acceptable. In an intent-first world this role becomes more accountable, because decisions are encoded directly into what the system will build. Sloppy product thinking used to get diluted by the human interpretation layer between PM and engineer. That layer is shrinking.
The Domain Expert
They own the domain model and validates the behavioral rules against real-world conditions. In healthcare, this is a clinician. In supply chain, it is a procurement lead. In fintech, a risk officer. Domain experts have always been consulted; the shift is that they now co-author the artifact rather than reviewing it later. This is uncomfortable for organizations that treat domain expertise as a consulting input rather than an engineering input.
The Engineer
They own the constraints, the acceptance evidence, and the technical feasibility of the rules as written. The engineer’s job in spec authoring is to push back on ambiguity, surface infeasible constraints, and ensure the spec produces something testable. This is closer to what senior engineers actually do well, reasoning about systems, than what junior code review traditionally asked of them.
The Spec Review Ritual
The ritual that holds these three together is the spec review, which is closer in feel to a design review than a sprint planning meeting. The spec is opened in a structured editor and walked through section by section. Ambiguities get flagged in the artifact itself; disagreements get resolved by adding rules, not adjectives. What comes out the other side is a versioned, signed-off spec that downstream tooling can act on.
The first time a team runs this, something predictable breaks. Most often it is the domain expert. They have been brought into engineering meetings before, usually to review work that’s already done, and that muscle memory is hard to shake in the first session. They nod. They say “that sounds right.” They do not push back on a behavioral rule that, in their own work, they would never apply that way. The spec gets signed off and the gap surfaces a sprint later. The fix that follows is unglamorous. The facilitator has to actively draw the domain expert out, ask questions in their direction by name, and treat the engineer’s silence as the safer default. After two or three sessions the dynamic settles.
Teams that skip this ritual by having a PM draft the spec alone and circulate it for comment consistently produce incomplete specs. Not wrong, exactly. Incomplete in ways no one notices until generation produces something nobody asked for.
The Tooling Stack Teams Are Assembling Today
The Requirements IDE that the previous article gestured toward does not yet exist as a single product. What does exist is a set of tools that, assembled correctly, cover the same ground. The teams making intent-first delivery work today have stitched these together deliberately rather than waiting for a vendor to ship the complete solution.
These are the six functional layers that need to be covered:
| Layer | What It Does | Teams Use Today |
| Authoring | Structured editing of specifications with templates, schemas, and inline guidance | Domain-specific languages, structured Markdown, spec management platforms, custom internal editors |
| Version Control | Track every change to a spec, attribute it, and make it reversible | Git for text-based specs; spec management platforms with built-in versioning for structured specs |
| Linting and Validation | Catch ambiguity, contradiction, and incompleteness before generation | LLM-based spec reviewers, formal verification tools for high-stakes domains, custom rule engines |
| Behavioral Diffs | Surface what downstream behavior a spec change affects | Emerging space; most teams handle this manually with traceability matrices or impact-analysis scripts |
| Generation | Turn the spec into running code, tests, and infrastructure | Agentic coding systems, scaffolding tools, code generators wired to spec schemas |
| Acceptance | Verify that generated systems meet the rules and constraints in the spec | Property-based testing frameworks, policy-as-code engines, semantic test runners, observability platforms |
The pattern across mature teams is not that they have picked one tool for each layer. It is that they have made deliberate choices about where the spec lives, what schema it follows, and how it flows to the generation and acceptance layers. The schema matters more than the editor. A team writing specs in plain Markdown but following a strict, consistent structure will outperform a team using a sophisticated editor with no shared conventions.
One thing worth saying is that the cost of building this stack is real, and it is justified only when the team is generating enough code from spec to make the investment pay back. For a small team writing a CRUD app, the traditional ticket-and-PR loop is still the right call. For a team where AI is producing the bulk of the implementation, every hour spent improving the spec layer pays back across thousands of generated lines.
Failure Modes That Reliably Trip Up Early Adopters
Five failure modes show up consistently across teams attempting intent-first delivery.
1. Specifying the Implementation, Not the Intent
The most common failure. A team writes a spec that says, in effect, “build a Postgres table with these columns and a REST endpoint at this path.” The spec has been written as an implementation, which means the team has skipped the hard thinking about what the system should do and jumped to how. The generated code looks fine, but the feature does not solve the underlying problem.
I walked into this one personally. The signal that finally made me recognize it came from a junior engineer in a spec review, who asked why the spec specified a particular table structure when the question we were supposedly answering was how scores should behave. I did not have a good answer. Everyone more senior in the room was thinking in the same shortcut. We had drafted the spec quickly, with a schema already in our heads from a similar prior feature, and encoded it as if it were the requirement.
That’s the failure mode in a sentence. It’s invisible to the people most likely to draft the spec, and most visible to the people least likely to feel empowered to push back. The fix is to enforce in spec review that no implementation details appear above the constraint layer. If you find yourself writing a column name in the domain model section, stop.
2. Confusing Prose for Precision
A spec full of words like “appropriately,” “reasonable,” “as needed,” or “where applicable” is not a spec. The test is whether two engineers reading the same rule would write the same test for it. If not, the rule is not yet a rule. Spec linters catch some of this; disciplined review catches the rest.
3. Skipping the Domain Model
Teams in a hurry write the intent statement and the behavioral rules and skip the domain model, because the entities feel obvious. They are never as obvious as they feel. Half the bugs in shipped intent-first features trace back to a missing or contradictory entity definition that everyone in the room thought everyone else understood. The domain model is the cheapest place to catch these.
4. Treating Constraints as Someone Else’s Problem
Engineers write the behavioral rules. The compliance team is consulted at the end. The constraints get added as a checklist, not as policy-as-code. Six months later a regulator asks how the team enforces a rule, and the answer is a Confluence page from before the rebuild. Constraints belong in the spec, encoded in executable form, from the first version. If the compliance team is not in the spec review, the spec is incomplete.
5. Versioning the Spec Separately from the Code
A team adopts intent-first authoring, then keeps the specs in a separate system from the codebase. Within a release or two, the specs and the code drift apart. A bug fix updates the code; the spec is not touched. Now the spec is a lie, and trust in the practice collapses. The fix is mechanical. The spec lives next to the code, in the same repository, and a change to one without a corresponding change to the other fails CI.

Spec Authorship in Embedded-Team Engagements
Most writing about intent-first delivery assumes a single organization where product owners, domain experts, and engineers all sit on the same org chart, share a backlog, and answer to the same VP. A significant share of high-quality engineering doesn’t happen that way. When engineers are embedded inside a client organization, authoring authority crosses an organizational line, and that changes the dynamics of the practice in ways worth addressing directly.
Who Actually Owns the Spec
The first question an embedded team has to answer, and usually answers wrong the first time, is who has authoring authority. The instinct on the embedded side is to write the spec themselves, because they know what “good” looks like, and circulate it for sign-off. This works once. It doesn’t work twice, because the client correctly perceives they are being handed a contract drafted by the other party. The opposite extreme is just as bad: leaving spec authoring entirely to the client, which abandons the engineering rigor that motivated the practice in the first place.
The pattern that holds is co-authorship in form and client ownership in name. The spec lives in a system the client controls. The client product owner is the named author. The embedded team contributes structure, pushes back on ambiguity, and writes the constraint and acceptance sections the client organization is unlikely to staff for. Sign-off is the client’s. This is sometimes uncomfortable for embedded engineers who feel they are doing most of the cognitive work without controlling the artifact, and they are. The trade-off is that the spec becomes something the client will defend a year from now, when the engagement has rotated and the next team needs it to still be true.
When Only One Side Is Bought In
The harder and more common situation is an engagement where the embedded team has internalized spec-as-code practices and the client is still working in tickets and Confluence. Lecturing the client about engineering maturity loses the engagement. Pretending the practice doesn’t matter loses the delivery.
What works is asymmetric discipline. The embedded team writes specs in the form they need, internally, even when the client hasn’t asked for one. They translate the relevant parts into whatever artifact the client expects, usually a conventional requirements document, or a structured ticket. They draw out the inputs they actually need from the client, such as the intent statement, constraints, and domain model edges, and capture those answers in the structured spec on their side. The cost of translation falls on the embedded team. It pays back many times over in avoided rework, because the team is still generating from a coherent artifact even when the client isn’t authoring one. And when the practice holds consistently, the client tends to notice through outcomes. That’s typically how spec discipline spreads across an organization: one clean delivery at a time.
What Executable Specifications Demand from Engineering Leaders
That dynamic, the discipline demonstrated through delivery, is what engineering leaders need to actively protect and resource if intent-first practices are going to take hold at scale.
Three things, in particular, fall on engineering leaders to drive.
The first is the cultural reframing of who owns specification quality. In most organizations, a vague spec is a product management problem at worst, a cost of doing business at best. In an intent-first organization, a vague spec is a defect surfaced and fixed before generation begins. It carries the same seriousness as a vague type signature in a critical service. Leaders who tolerate spec ambiguity are quietly opting their team out of intent-first delivery, no matter what tools they buy.
The second is the willingness to slow down up front in exchange for moving faster downstream. Spec authoring takes longer than ticket writing. A well-run spec review takes longer than a typical sprint planning meeting. The investment pays back in reduced rework, but only if leaders protect the upstream time. Teams under pressure to ship will always cut spec time first if leaders do not actively defend it.
The third is treating specifications as code in a structural, not just philosophical, sense. The spec lives in the repository. It is reviewed in the same pull-request system. It has linters that block bad commits. It is deployed alongside the system it describes. Most organizations keep specs in a separate system from the code, and that single structural choice is what makes them drift, decay, and eventually become fiction.
The Engineering Discipline We Have Been Avoiding
Requirements engineering has been the unloved corner of the software profession for forty years. Tolerated, outsourced to product management, complained about, and in most teams, quietly skipped. That has been survivable because there has always been a human translation layer between the spec and the system: a developer who could fill in gaps with judgment, ask a clarifying question, or build something close enough to correct later.
That layer is shrinking. The teams that figure out how to write specifications with the same rigor they apply to code (version control, review, testing, modular design, separation of concerns) will define what intent-first delivery looks like for everyone else. The work is not glamorous. It is the unglamorous work that has always quietly determined which engineering organizations actually ship.
Key Takeaways
- An executable specification has five components: intent statement, domain model, behavioral rules, constraints, and acceptance evidence. Skipping any one of them produces downstream rework at machine speed.
- Specs must be co-authored by product owners, domain experts, and engineers in structured review sessions, not drafted by one role and circulated for comment.
- The most common failure mode is specifying the implementation rather than the intent — encoding how before deciding what.
- In embedded-team engagements, spec discipline often needs to be held asymmetrically: the engineering team maintains the structured spec internally even when the client hasn’t adopted the practice.
- Specifications belong in the code repository, reviewed in the same pull-request system as code — keeping them in a separate system is the single structural choice that guarantees they become fiction.

