Every AI writing assistant on the market treats your documentation as flat text. You paste in a paragraph, the AI rewrites it, and you accept or reject. What the AI cannot know: that paragraph is a shared component used in 6 other topics, that it renders only under an "enterprise" condition, and that 3 cross-references point to it. Structure-aware AI — where the system understands components, conditions, variables, and cross-references as entities in a content graph — is what separates verification-capable AI from glorified autocomplete.
Michael Iantosca, Senior Director of Content and Knowledge Platform Engineering at Avalara, has spent the last year writing about this problem from the enterprise side. His diagnosis is blunt: LLMs produce what he calls "Splenda reasoning" — output that "cosmetically resembles deterministic logic while remaining fundamentally probabilistic underneath." You cannot ship a deployment procedure based on statistical plausibility. You cannot update an API reference with an answer that is probably right. Documentation demands determinism, and LLMs are not deterministic systems.
The question is what sits between the LLM and your documentation to make the output trustworthy. Iantosca's answer is a full enterprise stack — DITA XML, OWL ontologies, RDF knowledge graphs, SPARQL queries, SHACL constraint languages. If you are managing millions of words across regulated domains, that rigor is earned. But the underlying principle applies at every scale: structure external to the LLM is what makes AI-assisted documentation reliable. The question for smaller teams is how much of that structure you need — and how to get it without the infrastructure overhead.
I built Topicary to test that principle at a scale accessible to small teams. Here is what I learned about what structure-aware AI can and cannot do — and how to evaluate whether your documentation is ready for it.
The knowledge collapse problem
The gap between AI perception and AI reality is real. METR's 2025 randomized controlled trial found that experienced developers took 19% longer on real tasks with AI assistance while perceiving themselves as 20% faster. Tom Johnson's "cyborg" model gets the diagnosis right: writers are augmented by AI, not replaced, and the real work shifts to verification.
But the problem goes deeper than individual productivity. Iantosca describes a systemic risk he calls "the coming collapse of corporate knowledge": AI systems are parasitic on the quality of human-generated content. As organizations cut documentation professionals and rely on AI to maintain content, models increasingly train on — and retrieve from — stale, unstructured, ungoverned information. The result is what he calls an "echo chamber of decaying truths" where AI feeds on progressively degraded content.
This is not an abstract risk. If your documentation has no version tracking, no provenance metadata, no structural relationships, there is no mechanism to identify what is current and what is obsolete. An AI assistant retrieving from this content will surface answers with equal confidence whether the source was updated yesterday or three years ago. The writer who validates that content — who tests the API, runs the procedure, checks the screenshot — is what keeps the system honest. Iantosca describes these people as "corporate truth validators" performing "investigative journalism" against the organization's own content. Remove them without replacing that validation function with structural governance, and the content decays silently.
The teams that will struggle are not the ones that adopt AI. They are the ones that adopt AI without structural foundations. AI makes you more productive, but only when the content architecture gives it something to reason about beyond flat text.
AI shifts the bottleneck from writing to verification
When content generation is instant, the writer's deliverable is no longer the document itself. It is the trustworthiness of the document. Every page of technical documentation contains assertions that could be tested: does this API endpoint return data? Does this UI match the screenshot? Does this procedure produce the stated outcome? As Johnson puts it, "the accuracy of our content is becoming almost my main deliverable."
This is where flat-content AI falls short. When you rewrite a paragraph in a Markdown file, your AI assistant sees that paragraph and nothing else. It cannot know that the same paragraph appears verbatim in 5 other pages. It cannot know that a conditional block should only render for the enterprise plan. It cannot know that the variable {{api_base_url}} resolves to different values across your staging and production documentation sites. Every one of these is a verification problem, and every one requires the writer to catch it manually.
Iantosca frames this as a fundamental architectural choice: "Structure is not about formatting or presentation. It is about making meaning explicit, parsable, and durable" so machines process content deterministically rather than probabilistically. When content conforms to a schema — when a component reference must resolve, when a condition must use valid dimensions, when a variable must exist in the project — machines do not need to guess what the content means. They can verify it.
In practice, this changes every AI interaction. When AI generates content that references {{api_base_url}}, the system validates that variable against the project's real entity list — AI cannot invent variables that do not exist. When AI rewrites a block tagged with an enterprise condition, the system verifies the output respects condition boundaries. When AI suggests a cross-reference, the target topic must resolve. The AI does not just generate text — every structural claim in the output is validated against the documentation's actual architecture.
Some verification can be automated outright. External links can be checked with HEAD requests — a weekly sweep catches broken URLs before readers do. API endpoints documented in reference topics can be probed to confirm they still respond. Code samples in JSON, XML, HTML, and YAML can be syntax-validated automatically. None of this replaces testing the actual procedure, but it catches the mechanical failures — the link that went 404, the API base URL that moved, the YAML sample with tabs instead of spaces — that waste a writer's time when discovered manually.
There is a reason radiologists have not been replaced despite years of AI hype in medical imaging: someone has to sign off. But the radiologist's AI highlights the anomaly on the scan and shows similar cases from the database. It does not hand the doctor a raw image and say "good luck." Structure-aware AI does the same for documentation — it flags the anomaly (a rewrite that breaks consistency across 6 topics) and shows the structural evidence (here are the 6 topics).
Context belongs to the graph, not the prompt
In most AI writing workflows, the writer manually assembles context before every AI request — pasting in API specs, style guides, related pages, and code diffs. Johnson calls this "context engineering" and describes it as a core skill: "I almost never start a session without a bunch of context." In flat-content tools, this is correct. The writer is the context assembler because the AI has no way to know which files relate to the one being edited.
Iantosca argues that this gets the architecture backwards. In his DOM Graph RAG model, retrieval preserves the document object model — hierarchical relationships between sections, procedures, warnings, reusable snippets, metadata inheritance, and conditional content. Standard vector RAG splits this into chunks — whether by token count, sentence boundary, or semantic break — and stores them as embeddings that capture textual similarity but discard the hierarchical structure that tells you why a section exists, what it relates to, and what conditions govern it. DOM Graph RAG keeps that structure intact so the retrieved context carries its original semantic boundaries, not just its text. The LLM receives only the query and the structurally grounded context; retrieval, filtering, and relationship resolution happen outside the model.
This scales. An LLM prompt window does not. As Iantosca puts it, no matter how large the context window becomes, the model still faces "attention allocation, retrieval consistency, instruction persistence, citation fidelity, and cross-document coherence" problems. Organizations that dump an entire documentation repository into an LLM without structural governance are pursuing what he calls "a budgetary cry for help, not architecture strategy."
In a CCMS with a content graph, context assembly is partially automated. When I built Topicary's AI chat, I made management queries — "what topics cover the same ground as this one?" or "what breaks if I change this component?" — resolve through graph traversal, not LLM inference. The system traverses actual component references and actual cross-references — deterministic lookups, not statistical inference. Embedding similarity supplements these with a probabilistic signal (which topics are semantically close?), but the structural queries that answer "what references this component" or "what conditions apply here" are exact. The answers come from the content graph, not from asking an LLM to guess.
| Capability | Flat-content AI | Structure-aware AI |
|---|---|---|
| Rewrite a paragraph | Generates new text. Writer manually checks other pages for the same paragraph. | Generates new text. Warns that this paragraph is a shared component in 6 topics. Shows diff preview. |
| Answer "what breaks if I delete this?" | LLM guesses based on text similarity across files it has seen. | Graph traversal returns the actual list of topics that reference this component. No LLM inference needed. |
| Assemble context for an AI request | Writer manually selects relevant files, API specs, style guides. | System automatically assembles structural context: components used, conditions applied, cross-references, surrounding topics. |
| Detect inconsistent terminology | LLM scans visible text for synonyms. Misses files not in context. | Terminology drift detection runs across the full project graph. Flags "setup" vs. "set up" vs. "set-up" across all topics. |
| Suggest content reuse | Not possible without seeing all files simultaneously. | Duplicate passage detection flags paragraphs that match content in 2 or more other topics. Suggests component extraction. |
| Validate AI-generated structural elements | No validation. AI can hallucinate variable names, invent cross-references. | Schema contract validation: AI cannot insert references to components that do not exist or create conditions with invalid dimension and value combinations. |
| Propose structural improvements | Not possible without understanding the content architecture. | Pattern detection across topics suggests missing condition dimensions, hardcoded values that should be variables, taxonomy tags from content clusters, and map splits when a single map spans multiple topic communities. |
| Surface content gaps | No feedback loop between readers and authors. | Reader queries with zero results cluster into actionable content gaps with suggested titles and confidence scores. |
| Measure AI-search readiness | No metric. You hope your content works for AI consumers. | Per-page scoring across structural clarity, semantic completeness, answer density, schema potential, and internal linking. |
Your published docs have a new audience
A growing share of documentation traffic comes from machines, not humans. GitBook reported that AI-driven page views accounted for 41% of traffic on their hosted documentation by December 2025, up from 9% in January 2025. That is one platform's data, and publicly hosted developer docs likely attract more bot traffic than internal knowledge bases. But the direction is clear and the growth rate is steep. If your docs are not machine-consumable, they are invisible to a growing share of their audience.
Machine-consumable is not just a formatting choice, though. It is about what information your published output actually exposes.
A static HTML site gives an AI agent the rendered text. An llms.txt file gives it a page index. A per-page Markdown URL gives it clean content. These are necessary baseline steps — Topicary published sites serve all three, along with sitemap.md for agent path discovery. But a structured system can enrich even these standard formats. When each page in the llms.txt index carries its topic type, an agent retrieving a procedure knows it is a procedure before reading the content. When the index includes a "Related Pages" section derived from embedding similarity, the agent can follow structural relationships between pages without crawling the entire site.
The structural layer goes further. An MCP server lets an external AI tool — Claude Desktop, a custom agent, an internal workflow — query the documentation's component relationships, condition variants, and cross-references. An agent can ask "which topics reference the authentication component?" and get a structural answer, not a text-search approximation. This is what "machine-consumable" means when the content is structured: the machine consumes the relationships, not just the words.
There is a tempting shortcut: serve flat Markdown through these delivery layers and use AI workflows to keep the content current — AI reads code diffs and proposes documentation updates, AI translates pages, AI detects gaps by comparing reader queries to existing text. This approach treats AI as the governance layer over unstructured content. But when AI proposes a translation, nothing validates that the translation preserved the cross-references between pages. When AI updates a procedure because the code changed, nothing checks whether that procedure is a shared component used in 6 other topics. When AI detects a content gap, there is no content graph to verify the gap's relationship to existing coverage — just text similarity against flat files. Iantosca's observation applies: without schema enforcement, machines are "limited to guessing structure through statistical techniques." Delivery structure makes docs accessible to AI. Content structure — components, conditions, variables, cross-references as entities in a graph — is what makes AI reliable.
Knowing your docs are machine-consumable is one thing. Knowing how well each page actually performs in AI search contexts is another. A page with clear heading hierarchy, dense factual statements, structured data markup from its topic type, and internal cross-references will surface better in AI-powered search than a page of meandering prose with no structure. Measuring these factors per page — structural clarity, semantic completeness, answer density, schema potential, internal linking — gives authors a concrete score to optimize against, not a vague directive to "write better for AI."
Iantosca's vision takes this further with formal standards like iiRDS, which attach delivery metadata — content type, audience, lifecycle phase, product applicability — to each piece of documentation in a machine-readable format. These serve different purposes — iiRDS is a metadata classification standard, MCP is a tool interaction protocol, structured exports are a data format — but they share a direction: AI systems work better when they can reason about what content means and who it is for, not just what it says.
Passive intelligence: what the system finds without asking
Most of the conversation around AI writing is about actively prompting — you ask, AI answers. But structured content enables a complementary pattern: the system passively surfacing findings that you act on when relevant.
Community detection combines structural relationships — shared components, cross-references — with semantic similarity to discover topic clusters you did not explicitly organize. A project with 200 topics might reveal that 14 of them form a natural cluster around authentication — not because anyone tagged them that way, but because the system found structural and semantic overlap that manual organization would miss.
Passive quality findings work the same way. Similarity scoring flags paragraphs that express the same idea across multiple topics — even when the wording differs — as candidates for component extraction. Missing cross-reference suggestions surface when two topics discuss the same concept but do not link to each other. Terminology drift detection catches "setup" in one topic and "set-up" in another.
Structure suggestions take this further. If your content mentions "For administrators" and "For developers" across multiple topics but you have not defined an Audience condition dimension, the system can detect those patterns and propose the dimension with its values. If a product name appears hardcoded in 8 topics, the system suggests extracting it as a variable. If community clusters reveal natural topic groupings, the system proposes taxonomy tags. If a single map contains multiple distinct topic clusters, the system suggests splitting it into focused sub-maps aligned with the clusters it detected. Each suggestion carries a confidence tier — explicit pattern match, cross-topic frequency analysis, or AI classification — so you know how much weight to give it.
The same principle extends to content health. Reader queries that return zero results cluster into content gaps — real questions your documentation does not answer. A nightly scan detects stale topics, expired verifications, near-duplicates that should be merged, orphan topics not assigned to any map, and published pages scoring poorly in AI-search readiness. Each finding becomes a recommendation: update, verify, merge, expand, archive. The writer reviews and acts; the system never modifies content autonomously.
These signals converge in the writing interface. Ask "what needs attention" in the chat panel and the system returns lifecycle recommendations, open content gaps with confidence scores, and the project's overall health metrics — without leaving the editor. The writer gets the same information a dashboard would show, in the context where they can act on it.
None of these interrupt you. They appear in a panel tab or a dashboard section, silently accumulating. You review them when you want to, accept or dismiss each finding, and move on. This is the opposite of the AI-writes-your-docs narrative. The AI reads your content architecture and tells you what it found. You decide what to do about it.
You do not need the enterprise stack to get the principle
Iantosca makes a compelling case for why structured content is the foundation of trustworthy AI in documentation. His architecture — knowledge graphs, ontologies, SPARQL queries, SHACL constraints, deterministic validation — is rigorous and correct. If you are managing millions of words across regulated domains, you need that level of governance.
But the principles do not require the full stack. The core insights translate to lighter implementations — with trade-offs worth understanding:
- The LLM generates; the structure constrains. Iantosca's version uses OWL ontologies that support formal logical inference — deriving new facts from explicitly modeled rules. A content graph built on Postgres and pgvector does entity validation and similarity search, not formal reasoning. The gap is real: you cannot do OWL-style inference ("if this API is deprecated and this task references it, flag the task") with SQL joins alone. But the foundational principle — that structure external to the LLM is what makes AI output verifiable — applies at both levels of formality.
- Context is resolved, not stuffed. SPARQL queries over an RDF graph support semantic reasoning, transitive relationships, and class hierarchies. SQL joins traversing component references are relational lookups — simpler and less expressive, but sufficient for the most common documentation queries: what references this component, what conditions apply, what topics are similar.
- Provenance matters. Whether tracked through iiRDS metadata or through interaction logging with full prompt/output/outcome triples, knowing where an AI answer came from — and whether the source is current — is what separates trustworthy AI from plausible AI.
- You validate what AI cannot. Both the enterprise and the small-team version agree on this: AI does not replace the person who tests the API, runs the procedure, and verifies the screenshot. It makes you more productive by handling structural consistency automatically.
The difference is who can actually use it. A 5-person docs team is not going to deploy GraphDB, PoolParty, and a DITA-OT pipeline. But they can adopt structured authoring — components, conditions, variables, cross-references — in a modern web-based tool and get many of the structural benefits that make AI reliable, even without formal ontological reasoning.
What this does not solve
None of this is a universal solution. Three honest limitations:
Domain verification is still on you. Automated checks can catch broken links, unreachable API endpoints, and malformed code samples. But they cannot tell you whether the API returns the data the spec claims, whether the CLI flags produce the documented behavior, or whether the screenshot matches the current UI. You still have to test the procedure and verify the result. Structure-aware AI narrows the mechanical verification surface — you spend less time discovering that a URL went 404 and more time confirming that the procedure works — but domain expertise remains the most valuable skill in documentation.
Small projects may not need this. If your documentation is 30 pages with no content reuse, no conditions, and a single publishing channel, the structural overhead is not worth it. A flat Markdown repo with a good AI coding assistant may genuinely be the better choice. GitBook or a static site generator handles that well.
AI does not automatically make you faster. The METR perception gap — 19% slower in reality, 20% faster in perception — was measured with early-2025 AI tools, and METR's own February 2026 follow-up suggested the gap may be narrowing. But temper your enthusiasm: AI assistance does not automatically mean faster work, even with structural awareness. The value is in catching things you would miss, not in raw speed.
Is your content ready?
Six questions tell you whether your docs can benefit from structure-aware AI or whether you are stuck with flat-text assistance. These apply regardless of which tool you use.
- Can AI warn you before a rewrite breaks other pages? That requires reusable components with where-used tracking. If your reuse is copy-paste, AI has no way to know a paragraph exists anywhere else.
- Can AI validate that generated content respects audience conditions? That requires conditional content with explicit dimension and value pairs. If conditions live in the writer's head or in ad-hoc comments, AI has nothing to validate against.
- Can AI catch hardcoded values that should be variables? Product names, version numbers, and API URLs change. If they resolve as variables at publish time, AI can validate them. If they are strings scattered across files, AI cannot distinguish a value from its surrounding text.
- Can AI traverse relationships between topics? That requires cross-references tracked as structured links. Inline URLs that break silently when the target moves give AI no relationship graph to reason about.
- Can AI agents parse your published output?
llms.txt, per-page Markdown URLs, and structured metadata make documentation machine-consumable. AI-driven traffic already accounts for 41% of page views on GitBook-hosted docs. If agents cannot parse your output, your docs are invisible to a growing share of their audience. - Do reader queries feed back into your content planning? When readers ask questions your docs cannot answer, that signal should surface as a content gap. If your reader AI logs are disconnected from your authoring workflow, you are blind to what your audience actually needs.
If you check all six, your content architecture is ready for structure-aware AI. If you check zero, an AI writing assistant will still help with drafting — but the verification bottleneck will remain entirely manual.
Your docs are infrastructure now
This is not AI writing your documentation. It is your content architecture making AI useful.
The craft of structuring documentation — defining components, setting up conditions, maintaining cross-references, designing content maps — becomes the foundation that determines how much AI can help. Iantosca calls it "making meaning explicit, parsable, and durable." I call it structured authoring without the XML. The principle is the same: structure your docs, and AI has something to reason about. Leave them flat, and AI is guessing.
Context assembly becomes partially automated. Verification becomes partially structural. Your domain expertise gets applied to the hard problems (does this API actually work the way the docs say it does?) instead of the mechanical ones (did I break consistency across 6 shared components?).
You are becoming a documentation architect whether you signed up for it or not. The structure you build is what powers every AI interaction — for human writers and machine agents alike. The question is not whether AI will change your workflow. It is whether your content architecture is ready for it, or whether you are asking AI to reason about flat text and hoping for the best.