The Hidden AI Tax in Regulated Engineering
Why Engineers Are Absorbing the Gap Leadership Hasn’t Named
Essay
~10 min read
“We have AI tools we can use if we have requirements written in EARS format for the other stages in the V-Cycle, but it doesn’t pay off because the manual authoring effort is too painful.” That is a senior engineer at a global automotive supplier.
The downstream AI is ready. The upstream is not. That gap sits between the requirements layer and every AI tool an engineering organization is investing in. And it is the entire reason most AI requirements pilots are stalling.
The bottleneck nobody is naming
Most AI tools designed for regulated engineering assume something the buyer rarely has: structured, atomic, machine-readable inputs. Without them, test generation, code synthesis, and verification tracing all break on requirements that mean three different things to three different engineers. We unpacked why in From Rules to Intelligence.
Organizations are buying AI tooling on top of an input layer that cannot support it. Cloudera and Harvard Business Review reported that only 7 percent of enterprises consider their data fully AI-ready. Gartner has projected that through 2026, organizations that fail to operationalize AI-ready data architectures will see at least sixty percent of their AI projects abandoned. In regulated engineering, where the data is specifications and design artifacts rather than customer behavior, that number is almost certainly worse.
The pattern
What makes this a pattern rather than a one-off is that organizations in every regulated engineering industry are hitting it independently, and almost none are comparing notes.
A global medical device manufacturer is formalizing an AI operating model and has described their internal posture on requirements quality measurement as going from zero to one. At a defense systems prime, leadership is openly asking what AI tools can help break requirements out and flow them down faster, while the engineering council at the same company is simultaneously warning that engineers should not be pushing buttons and letting AI make decisions on their behalf. A federal aerospace research organization is running formal internal studies on large language models for requirements testability evaluation. At a precision engineering company, leadership is watching engineers go off and use AI on requirements without organizational oversight, because the sanctioned process cannot keep up with what individual engineers want from the tool.
Five different organizations describing the same underlying problem in five different vocabularies. None of them is solving for it as a category, which is why the bottleneck persists.
What this looks like on the ground
The bottleneck shows up as habits, not missing tools.
At a clinical diagnostics manufacturer, engineers using Copilot reported having to re-provide the same context every session because the tool has no persistent memory of the rules and standards governing their requirements. Copilot may seem convenient, and the cost has become invisible, but it is really a manual workaround posing as a workflow.
At a medical technology company, engineers tested Copilot rewrites of an existing requirements set, and the output produced thirty additional requirements with no structured change mapping. The team’s response was to open two separate documents side by side and manually compare them line by line. That is forensic reconstruction.
At a heavy industrial manufacturer, engineers ran prompts against more than forty INCOSE rules trying to validate AI-generated requirements, and concluded that ensuring every rule had actually been checked was something the prompt could not verify. The engineers ended up manually checking the rules after the AI ran.
These are not isolated stories. Researchers studying explainability and AI tools in regulated industries have catalogued the same workarounds across the field, with manual validation cited as one of the largest hidden costs of deploying AI into compliance-bound workflows. The behaviors below show up everywhere AI has been introduced before the upstream is ready. Five patterns tend to emerge in every adoption cycle of this kind:
- The conversion tax. Engineers reformat requirements into EARS or another structured syntax before feeding them to the AI tool, then reformat the output back into whatever the system of record expects.
- The whisperer dependency. One or two engineers on the team learn how to prompt the tool well enough to get usable output. The rest of the team queues behind them. Bus factor of one. The whisperer becomes the de facto AI operating model for the team, and when they change organizations or retire, the institutional prompt knowledge walks out the door with them. At a global oil and gas standards body, “There’s a lot of knowledge leaving the industry. There are a lot of people leaving now in their fifties and sixties, and that knowledge is up here.” At a defense systems integrator, the primary internal champion retired mid-program, and the team had to scramble to rebuild both the advocacy and the working knowledge before the next review cycle.
- The shadow document. Teams maintain a real requirements set and an AI-readable requirements set in parallel, with manual reconciliation between them.
- The shadow workflow. Engineers bypass the sanctioned AI tooling and routing requirements by using whatever general-purpose tool they have access to, even if it is ungoverned, because it is faster. At a medical technology manufacturer, Copilot has been adopted informally to generate and assess requirements with management’s awareness, but the output still requires manual SME review on the back end. At a test and measurement company, the quality lead flagged engineers independently routing requirements through generic AI tools as a consistency problem rather than a productivity gain: “I’m a bit bothered by people going off and using AI to refine the stuff. I think it’s missing the point of trying to establish consistency.” None of these organizations had set out to build a shadow workflow. The shadow built itself because the official tool was not fast enough, and the gap was real.
- The cleanup engineer. Junior staff quietly absorb input preparation as their de facto job. Their title says systems engineer. Their day is data hygiene.
This is the AI tax getting paid in engineer hours instead of line items. Which is exactly why it stays off the leadership radar. By the time the bottleneck reaches a budget conversation, it has been laundered through normalized workarounds and is no longer recognizable as the cost it actually is.
The view from higher up
There is a second reason this stays invisible at the leadership level, and it is harder to talk about. The larger AI investment defends itself, even when no one is consciously defending it.
In some organizations, the defense is active. Leadership has made a visible commitment: the Copilot license, the internal LLM platform, the AI strategy presented to the board. These are real budget items with real expectations attached, and when engineers start working around the tool, the framing from above is closer to a defense of the investment: the tool is in, and the team needs to use it.
In other organizations, the defense is quieter. The AI capability arrived bundled inside an enterprise license that the company is already paying for. It is part of the stack. Nobody is championing it, but nobody is questioning it either, because the cost is already sunk and forgotten. When a purpose-built tool comes up, the conversation rarely gets to a fair comparison. It stops at “we already have something for that,” even when the something for that was never built for the job. At a nuclear and industrial engineering firm, Glean had been deployed enterprise-wide with mandatory training and was being used to generate requirements from documents. The evaluator acknowledged it directly: “What it can do is write you a requirement — that doesn’t necessarily mean they’re going to be in EARS format.” The capability gap was visible, but the conversation still had to work against the weight of an already-committed budget line. At a medical device manufacturer, Copilot was actively being used for requirement generation and assessment. Management had accepted the output quality as sufficient. The SME review that caught what Copilot missed was just absorbed into the workflow — invisible as a cost, normalized as a habit. That is exactly how a bundled tool defends itself without anyone in the room making the argument.
This is the same dynamic that has played out in the previous waves of enterprise software adoption. Organizations bought the incumbent platform because it was bundled with the rest of the stack, and the budget was already committed. Users found something lighter that actually fit how they worked. Workarounds became standard practice. Eventually, leadership ratified what was already happening, but the gap between the official tool and the real one had been visible at the engineering level for quite some time, just not at the level where it could be priced. Fortune reported that 90 to 95 percent of enterprise AI pilots fail to reach production, and a large share of that failure traces back to precisely this kind of misread between what the organization bought and what its people are actually doing.
The same pattern is forming inside enterprise AI in regulated engineering right now. The official AI tool is the one leadership presented to the board. The actual workflow is whatever engineers have cobbled together to make their requirements legible enough for the official tool to work on top of. The workarounds themselves are unauditable. A shadow document, a consumer LLM session, a whispered prompt that only one engineer knows: none of these survives a notified body audit, not in an ISO 26262 functional safety assessment, or a DO-178C stage of involvement review. The longer the AI tax compounds in shadow workflows, the harder it becomes to retrofit traceability after the fact. Read the workarounds as data: that is what they are. They are the most honest map a leadership team will ever see of what the official AI investment is actually missing.
The architectural answer
What sits underneath this problem is the comprehension layer Jordan Kyriakidis described in The Wrong Question: a structured representation of the engineering knowledge an AI tool needs in order to reason over a program reliably. Vocabulary reconciled across artifacts. Trace relationships made queryable. Rationale captured before it retires with its author. Addy Osmani has named the cost of missing this kind of layer “comprehension debt” in the context of AI-generated software code: code that ships faster than anyone can fully understand, with the understanding postponed until something breaks. Regulated engineering is carrying the same debt at the requirements layer, with certification stakes layered on top.
A comprehension layer alone is not enough. Once the input is clean, the next problem is whether anyone can prove the AI’s output was correct. In Jordan’s framing, the AI tools most organizations have today are scribes. They draft, generate, and rephrase. What regulated engineering also needs is the gavel: a system that scores AI outputs against the safety, traceability, and certification requirements a program will be audited against. That records the chain of evidence behind every decision. The gavel does not replace the engineer. It gives the engineer a defensible record. The gavel is what stops the engineer from manually re-checking forty INCOSE rules, two compliance standards, and a project-specific style guide every time the AI returns an answer. INCOSE itself has begun explicit programming on bringing requirements engineering into the AI age, a signal that the discipline is starting to recognize this as a category-level problem rather than a tooling question.
The engineer remains essential through both layers. They provide the context the comprehension layer cannot infer on its own, the judgment the gavel cannot synthesize, and the design intent that no model has access to without them. What the architecture removes is the manual reformatting, the side-by-side document comparison, the prompt engineering tax, and the forensic reconstruction. What it gives back is engineering time spent on engineering.
The deployment posture matters as much as the architecture itself. The comprehension layer and the evaluation system are what make the architecture sound. The deployment posture is what makes it admissible: customer data segregated from model training, exportable audit logs that survive a review cycle, version pinning so a notified body can see exactly what the AI saw at the moment a decision was recorded, and clear data ownership boundaries between vendor and customer. Sergey Irisov, writing in European Business Review, has framed the wider governance wrapper for AI in regulated engineering: decision-boundary tiers, lifecycle data ownership, model versioning, and audit-grade traceability. The comprehension layer and the evaluation system sit underneath that wrapper, at the input and adjudication layers, where the workarounds documented above are actually being paid for.
This is what bundled enterprise AI cannot deliver. A general-purpose copilot trained on the open internet, hosted in a vendor cloud, without persistent project memory and without an evaluation system bound to certification standards, will not stand up to what regulated engineering actually requires. Not in a notified body audit. Not in an ISO 26262 functional safety assessment. Not in a DO-178C compliance review. The bundled tool was built for a different problem.
Scale or stall
As one of the regulated engineering organizations referenced above, leadership recently asked engineering how long it would take to make their existing requirements set AI-ready. The answer was approximately three months. The engineering team had to plan around that timeline before any of the downstream AI tools the organization had already licensed could be put into production use. Three months of senior engineering capacity, against a license clock that had already started ticking. That is the shape of the AI tax when it finally surfaces in a leadership conversation.
Organizations that treat this as a category-level challenge and invest in the underlying architecture early will scale. Those who delay will not avoid the work. They will just end up doing it later, under regulatory pressure and with auditors watching. The EU AI Act begins enforcing high-risk obligations on August 2, 2026, with penalties reaching up to seven percent of global turnover for the most serious violations. At the same time, the certifying authorities and notified bodies that govern each industry continue on their own audit cadences.
Purpose-built tools fail without the architecture. Bundled enterprise AI fails because it cannot become purpose-built.