Best of Both Worlds: The Real Case for Purpose-Built Tooling in the Agentic Stack
The companion to Where the Profits Live: A Map of the MCP Layer. The map placed the specialist in the architecture. This piece is what the specialist does inside it.
Perspective
~20 min read
Horizontal AI is now the default generator
In most engineering organizations, the first version of an artifact no longer comes from a human. It comes from a prompt. Requirements, design descriptions, analysis drafts, traceability summaries, and early test cases are being drafted by general-purpose models inside the tools engineers already live in.
This is not an argument against that shift. Horizontal AI belongs in engineering workflows; engineers have decided that for their organizations by using it. The argument has moved on.
Leadership is now asking the question, or is about to: with AI everywhere, why keep a specialized tool when AI agents can do the checking? It is the right question. Here is the answer.
The category is in motion. New AI-native platforms, incumbent RM tools adding their own AI, teams building the layer themselves, and our own bet on MCP are all in the conversation. We will examine each as the piece develops, and make the case for the one we are betting on.
The line under everything
Horizontal AI produces artifacts. Engineering systems consume structured inputs. The specialist exists in the boundary between the two.
Every other argument in this piece is a longer version of that line. The artifact your horizontal assistant produces is fluent, plausible, and sometimes correct. The system downstream of that artifact (validation, traceability, test generation, certification, V&V, the AI tools your organization is now licensing for each of those) does not consume fluent and plausible. It consumes structure, stability, and repeatability. Generation does not produce those properties. They have to be enforced.
The cost of fixing a requirements error is not uniform across the lifecycle. It is exponential. Requirements errors cost US businesses an estimated $30 billion per year, according to one Software Engineering Institute analysis. The defect-cost curve is well documented: based on NASA’s study on error cost escalation and parallel research at IBM’s Systems Sciences Institute, a defect caught in the requirements stage costs 1 unit to fix. The same defect caught in design costs 6 times more. In build, 11 times more. In integration and test, between 21 and 78 times more. In release, 90 times more, before counting recall, reputation, or regulatory exposure. The Project Management Institute puts numbers on the prevalence: nearly half of unsuccessful projects (47%) fail to meet their goals because of inaccurate requirements management, and for every billion dollars an organization spends on projects, fifty-one million are wasted on poor requirements practice. A customer who had been using QVscribe for five years estimated that the tool caught roughly 80% of requirement issues at the authoring stage that would otherwise have surfaced downstream. The calculated savings: approximately $220,000 per year in avoided rework labor alone, on a single team, for a single year.
A program engineer at a federal aerospace research organization framed the same equation from the downstream end: “Reducing your test time and cost, which for us is very high — probably higher than requirements and design and code — is the test cycles by far the highest. So how do we show that improving the quality of the requirements also reduces the amount of time necessary to test them.”
The structural gap your stack is now absorbing
Most engineering systems were not built for probabilistic input. They assume consistent structure, stable semantics, repeatable evaluation, and deterministic transformation of inputs into outputs. None of those properties are the default output of horizontal AI.
What horizontal AI produces is, at the level of a single artifact, hard to fault. The text is syntactically valid, the form recognizable, the fluency high. At the scale of one requirement, on one engineer’s screen, the output looks usable. The mismatch shows up when those individual artifacts have to aggregate into a system: across twelve engineers, three teams, two product lines, and one certification cycle. At that scale, what you are absorbing is not output but variance: semantic variance across runs of the same prompt, structural variance across engineers using the same tool, configuration variance across teams that never agreed on the same standards.
The divergence between a general-purpose AI and a specialist tool is not obvious at the scale of one requirement. It becomes visible at the scale of a document and sharp at the scale of a program. A team at a major medical device company ran the same requirements set through both Copilot and QVscribe. The surface-level results looked similar. Copilot produced no change record, fragmented the original requirements into approximately thirty additional items, and required the reviewer to hold two documents open simultaneously to reconstruct what had changed.
“I did have to open 2 separate documents, and then I had to do a comparison. Then I also had to think about what did they change because they didn’t have any information… it actually became a few more additional 30 requirements or so. So I can imagine if you have like 1,000 requirements… it would be more time consuming for sure.”
The horizontal assistant produced output faster. The downstream cost of consuming that output (reconciling, mapping, verifying) was invisible until someone actually did the work. Syntactically correct, semantically hollow.
A senior engineer at a global oil and gas standards body named the practical version of this directly: “ChatGPT doesn’t… for spec writing document writing. That’s a no-go.” He was not arguing that ChatGPT cannot produce text. He was naming the shape of the work where that text does not become a reliable input. The gap is not correctness. The gap is structure.
What changes when horizontal AI is the default
The introduction of horizontal AI into the workflow does not remove the engineering system’s need for structured input. It shifts where the structuring happens.
Before, structure was enforced at the point of authoring. The engineer wrote in the form what their downstream systems required. Now, structure has been pushed downstream: from requirement creation to validation systems, from drafting to normalization, from human consistency to system-level reconciliation. The horizontal assistant’s output stops being the work product. It becomes an unstructured input stream entering deterministic systems that were never designed to interpret it directly.
Some of your team is already absorbing the cost of that mismatch. The engineers who reformat AI output into EARS before feeding it into the RM tool. The cleanup engineer whose title says systems engineer and whose day is data hygiene. The shadow workflow that exports to Excel, runs Copilot, and re-imports because the official path was the wrong shape. These are not anomalies. They are what happens when a system that expects structure is forced to consume probability. A familiar complaint inside systems engineering teams is that the role has inverted: document-tracking is now the work, and engineering is the interruption.
The shadow workflow problem is not limited to export-to-Excel workarounds. It appears at the authoring layer too, as engineers route individual requirements through whichever AI tool is closest. A requirements program lead at a precision engineering manufacturer described the pattern directly: “One of the comments we’ve got back is people are going off and using AI — Copilot, whatever — in order to refine their requirements.” His response was not enthusiasm. It was a concern: “I’m a bit bothered by people going off and kind of using AI to refine the stuff. I think it’s kind of missing the point of trying to establish consistency.”
The horizontal assistant is good at making individual requirements sound better. It is not good at making an organization’s requirements mean the same thing. Those are different problems.
What the specialist actually does
Specialist systems are often framed as “AI for requirements” or as competitors to whatever horizontal capability the organization has just bought. Neither framing is right. The engine is not a specialist generator competing with the horizontal generator. It is a different engine entirely, doing four jobs the generator cannot do on its own.
Structuring. Take free-form text, whether AI-generated or human-authored, and convert it into atomic, machine-readable, standards-aligned requirements that downstream systems can actually consume. This is what turns the horizontal assistant’s draft into something the rest of the stack can use.
One of the more direct expressions of what structuring delivers came from a program manager at a defense electronics manufacturer deploying QVscribe across a systems engineering team: “I think we could assign a lower-level engineer to write those requirements and work with QVscribe and put out something that a seasoned engineer would put out.” This is not a claim about AI generating requirements. It is a claim about what happens when an organization’s standards, the ones its senior engineers carry in their heads, are externalized into a system that any engineer can run. The specialist is not replacing the senior engineer. It is making the senior engineer’s judgment portable.
Validation. Apply deterministic, repeatable evaluation against your organization’s standards, your glossary, and your configuration. Not interpretation. Not generation. An assessment that produces the same answer for the same input every time, with a record of what was checked and why. A clinical engineer at a major medical device manufacturer named the architectural reason for this separation: “You shouldn’t have the LLM both generate it and measure it.” The same model that wrote the requirement cannot honestly grade it. The grade is what the system downstream consumes.
Stability. Preserve the configuration that produced today’s result, so tomorrow’s result is comparable. Preserve the standards your senior engineers have set, so they survive a generation of retirements. A senior engineer at a global oil and gas standards body named that risk plainly: “There’s a lot of knowledge leaving the industry. There are a lot of people leaving now in their fifties and sixties, and that knowledge is up here.” The specialist is where that knowledge gets externalized into a form the system can keep using after the engineer is gone.
Calibration. The quality model the specialist applies is calibrated across industries (medical device, aerospace, defense, automotive, semiconductors) and is not overfit to any one organization. The standards your team enforces with the specialist reflect what regulated engineering actually requires across the discipline, not what a single team’s internal preferences happen to be. That distinction matters when a new team is being onboarded, when an audit asks where the standards came from, or when leadership wonders whether your in-house rules are the right ones. The answer is not “we wrote them.” The answer is “the discipline did, and the specialist enforces them consistently.”
Structuring, validation, stability, and calibration are not features. They are properties that the rest of the engineering stack requires to function. Generation does not produce them.
The dependency chain nobody is naming
Generation does not reduce the need for structure. It increases it.
Acceleration in upstream generation creates more downstream normalization work: more variation to reconcile, more inputs to validate, more inconsistency between engineers to resolve, more implicit organizational knowledge to externalize so a system can apply it consistently. The license your organization is paying for the horizontal assistant produces output. The cost of making that output safe for the rest of the stack to consume has to be paid somewhere. If it is not paid by the specialist, it is paid by your engineers, in hours. We traced the surface area of that cost in The Hidden AI Tax in Regulated Engineering.
The dependency runs deeper than most organizations have yet mapped. Several customers are deploying downstream AI tools across their V-cycle, including automated test generation, model synthesis, and trace analysis, which depend entirely on requirements being written in EARS format to function. The bottleneck is not those tools. It is what feeds them. An engineering lead at a global automotive supplier described the situation plainly: “We have AI tools which we can use if we have requirements written in EARS format for the other stages in the V-cycle. These are already mature enough to be used and provide certain efficiency. But it’s very difficult to end up with EARS-written requirements.” The downstream AI is ready. The input it needs is not. The specialist is what closes that gap, not by generating requirements, but by enforcing the structure that makes every downstream AI tool in the stack usable.
What the specialist is becoming
Per-requirement scoring is what the specialist did first. The role is expanding.
Three capabilities, already in active development, point to where this engine is going. The first is cross-document and cross-program coherence: surfacing similarity, contradictions, and redundancies across documents and across programs that no single engineer reading any one document would catch on their own. The second is timeline-level quality tracking at the project, document, and requirement level, so a team can see how quality evolves over the life of a program and distinguish iteration from drift. The third is coherence between requirements and code: a structured representation that lets corporate AI assistants reason over requirements and code together rather than treating them as separate worlds.
Each of those capabilities is the specialist doing more of the work the engineering system needs at the program scale, not less. Generation is going to keep expanding. The structuring, validating, stabilizing, and calibrating work the specialist does is expanding with it.
There is one further capability worth naming. As MCP-style protocols become the standard for tool invocation, the specialist becomes directly callable from your team’s corporate AI assistant. Copilot, Claude, or Rovo, whichever one your organization has on contract, invokes the specialist transparently when the answer needs the structuring layer. The piece-one architecture, in working form.
Four bets on the same problem
The category has more than one architectural answer to the problem horizontal AI is creating. A new generation of AI-native requirements platforms is betting that the right response is to start fresh: leave the RM platform your team already uses, migrate your requirements into a new system of record, rebuild your integrations, retrain your engineers, and resubmit your audit trail to the certifying authority that already approved the last one. The pitch is that AI-native is worth that cost.
The incumbent RM platforms are making a different bet: keep the tool you already use, and add the AI to it. The capability lives where the requirements already live, which solves the migration question. But that AI is general by design, calibrated to fit the broadest possible customer base rather than to your specific engineering culture or standards. It is also bounded by what the RM platform itself can do. Delivering robust capabilities will often require tying in a specialist, which leaves you in one of two places. The line-item math does not get simpler: you are still paying for the RM platform, still paying for horizontal AI across the rest of your stack, and now paying for a third AI surface inside the RM platform that the other two cannot see. Or your team does without specialized inputs entirely.
Some organizations are betting on something different again: build the structuring layer themselves. A consultant who advises engineering teams on requirements quality put the conclusion plainly: “I don’t think the whole-life cost, if you have your own tool, your guys will easily spend $50K, and then they need to maintain it, and then you get a poorer product.” That reframes the licensing question entirely. The bundled AI license is visible. The cost of building and maintaining the structuring layer yourself does not occur until an engineer has spent six months on it, and the result still does not enforce INCOSE rules reliably.
We are betting on MCP. The specialist does the structuring, validation, stability, and calibration that the engineering stack needs, against requirements that stay exactly where they live. It is callable by whatever AI your team is already using: the horizontal assistant in your team’s daily tools, or the AI capability inside the RM platform itself. The same specialist, the same configuration, the same record, available wherever the work is happening. The first bet asks your organization to start over. The second asks it to settle for general where it needs specific. The third asks it to build what it could buy. Ours asks it to keep going. There is enough disruption in engineering right now without paying extra for it.

Why it matters now
Piece one named the change in distribution. MCP-style protocols are making it easier for AI systems to connect directly to specialist tools and to each other. That connectivity is real, and it is advancing. But connectivity does not resolve structure. It exposes the gap more directly. The question is no longer whether two systems can be wired together. The question is whether what flows between them is stable enough to be safely consumed.
A senior engineer at a Fortune-100 industrial customer described the architecture his team is moving toward: “Some of that can be done with things like model context protocol and other elements. So we can ping the requirements thing, and then we can ping something else and do our analysis of completeness.” The horizontal assistant pings the specialist when the answer has to be defensible. The specialist returns a structured, validated, stable result. The completeness analysis then runs on that result. That chain only works if each system can trust the input from the one upstream. The specialist is what makes that trust possible.
What the “best of both worlds” actually means
The phrase is often used loosely, as if it described a choice between tools. It does not. In practice, it describes a pipeline.
Horizontal AI generates candidate artifacts. So does the AI inside your RM platform. The specialist takes whatever they produce and structures, validates, and stabilizes it against your standards. Engineering systems then consume the validated input. Downstream AI tools operate on stable, governed data. Each layer depends on the one before it, being transformed into something usable. Without the transformation layer, downstream systems degrade. Not because the generation is poor, but because what they are consuming is not stable.
The case for the specialist is not that it generates better text. It is that it is the layer that makes everything downstream of generation possible.

The position
The validation failure is reproducible. A defense shipbuilder running internal trials of horizontal AI for requirement rewriting found a consistent pattern: the rewrites were linguistically acceptable, but rule coverage was incomplete. “We have more than 40 rules in INCOSE. And to have a check over all these rules, and to be sure that all these rules have been checked — this is what does not work for such an easy prompt.”
That gap is what governance is built to close. Horizontal AI writes text that sounds correct. It does not produce the record of which rules were enumerated, which were applied, and which were missed. The specialist does. That record is the governance layer your engineering stack relies on, and what makes the output auditable, repeatable, and consumable by every downstream system.
AI is going to generate engineering artifacts, whether through horizontal assistants or through the AI capabilities being added inside the RM platforms themselves. That is no longer the question. The question is whether the structure, stability, validation, and calibration the rest of the engineering stack needs are produced by a system designed for them, or absorbed by engineer hours, license waste, and the slow decay of downstream AI tools that cannot work on inputs no one configured. Piece one mapped where the value goes next in the agentic stack. This piece names where it accrues. There is no version of the next decade where that absence stays invisible. There is no version where horizontal AI fills it. There is no version where platform-embedded AI fills it either. The specialist is the only system in the stack designed to.