Infrastructure as Reasoning: A New Paradigm for IaC

Most infrastructure teams are still struggling with Infrastructure as Code. Code is the source of truth; documentation is optional. Some teams have moved further — keeping documentation in git alongside the code, treating docs as a code concern. That’s progress.

But there’s a progression beyond that.

I started in the traditional place. Code was authoritative. Documentation lagged. Then I moved to docs-as-code — design documents in git, version controlled, reviewed like code. Better. But code and docs were still separate concerns. When Ansible failed, I’d fix the code, then update the docs. Days later. One would always lag.

Then I realized: what if the design documents weren’t describing what Ansible does, but prescribing it? What if the Markdown files were the configuration source that Ansible read from?

That changed everything. I moved from Infrastructure as Code to Design-Source IaC — the design is authoritative, and the automation executes what the design specifies.

But that’s still just the intermediate step.

The real leap is what happens when you add Claude into that loop. When Ansible fails, Claude reads the error, researches the root cause against the entire design corpus, figures out whether the issue is architectural (design constraint not documented) or operational (automation incomplete), and proposes fixes to both simultaneously. You validate. Both get updated.

That’s Infrastructure as Reasoning.

That’s the paradigm shift.

Design-Source IaC: The Intermediate Step

architecture-source is a VMware Cloud Foundation 9 lab — complex infrastructure. Nested ESXi hosts, dual domains, NSX VPC networking, VKS Kubernetes clusters, a gateway running DNS, NTP, certificate authority, BGP peering to the border. Dozens of interdependent subsystems.

Here’s how I structured it:

Markdown files in git are the single source of truth. Design docs, requirements, constraints, delivery guides, operational procedures. Everything. The design is not describing the infrastructure. It is prescribing it. Ansible playbooks and roles read from this documented architecture. The design is the configuration source.

This is the leap from traditional IaC. The code doesn’t drive the design. The design drives the code. When I need to change how the infrastructure works, I update the Markdown. Then Ansible reads those changes and executes them.

This alone is progress. Design becomes authoritative. Ansible is consistent with the design because it’s reading the design. Claude Opus reads the entire corpus — the conceptual design, logical design, physical design, delivery guide, and constraints — and regenerates design documentation (structured as conceptual, logical, and physical designs), implementation guides, and operational procedures that reflect the current design.

But this is still reactive. I update design, Ansible executes, Opus regenerates docs. Clean, but human-driven.

Infrastructure as Reasoning: The Paradigm Shift

Then I closed the loop with Claude.

When I deploy and Ansible fails, something breaks. Ansible error. I read it. I feed it to Claude.

Claude reads the error. It researches the root cause — reads the Ansible logs, traces the failure, checks what assumptions the task made. Then it reads the entire design corpus indexed by Local Expert. It understands what the system is supposed to do. Then it asks: is this a design issue or an automation issue?

If it’s a design issue, Claude proposes changes to the Markdown. If it’s an automation issue, Claude proposes changes to the playbook. Often it’s both — the design constraint wasn’t documented, so the Ansible task didn’t account for it.

I read Claude’s analysis. I validate the reasoning. If Claude is right, I approve. If it’s wrong, I correct it. Then Claude updates both the Markdown and the Ansible code. Claude Opus regenerates the design documentation (conceptual, logical, and physical designs), implementation guides, and operational procedures. GitHub Actions rebuilds the PDFs.

The next deployment runs against the updated automation and the updated design.

One source of truth. Three outputs. Tighter every cycle. But now the system reasons about itself.

The Proof: One Failure Cycle

I deployed the lab. Ansible failed on the NSX BGP configuration task.

The error: VLAN interface not found on the gateway. The Ansible task was trying to configure BGP on a sub-interface that didn’t exist.

I fed the error to Claude. Claude read the task, read the error, checked the design docs. Here’s what it found: the physical design doc specified VLANs 100-200 for infrastructure. The delivery guide said to create the gateway as a standalone VM. But the logical design had the gateway as dual-homed — one NIC to the external network, one NIC to the infrastructure VLAN. The Ansible task assumed the dual-homed setup, but the delivery guide didn’t actually create that second NIC.

So: was the design wrong, or was the automation wrong?

Claude checked my design intent against the logical design. The intent was correct — the gateway should be dual-homed. But the delivery guide was incomplete. Claude proposed two changes:

Update the physical design docs to explicitly specify the gateway’s dual-NIC configuration and VLAN tagging
Update the delivery guide task to create both NICs with the correct VLAN sub-interfaces

I read Claude’s analysis. It was right. The design was correct; the automation step was missing detail. I approved.

Claude updated the Markdown. Claude updated the Ansible playbook. Claude Opus regenerated the design documentation (conceptual, logical, and physical designs), the implementation guide, and the operational procedures — all reflecting the corrected design.

I re-ran Ansible. Clean.

That’s not fixing a bug and documenting it later. That’s Claude reading the failure, understanding the design, figuring out whether the issue was architectural or operational, proposing fixes to both simultaneously, and me validating the reasoning. One loop. One fix. Both code and docs updated.

The Journey: From Reactive to Reasoning

Traditional IaC: Code is authoritative. Docs are optional. Failures get fixed in code; documentation updates are afterthoughts or never happen.

IaC + Docs-as-Code: Code and docs are version controlled together. Better, but still separate concerns. When something breaks, you fix code and docs independently. One always lags.

Design-Source IaC: Design becomes the authoritative source. Ansible reads from it. When you change the design, automation executes the new design. Opus regenerates formal documentation from that design. Tighter, more coherent. But still reactive — you initiate every change.

Infrastructure as Reasoning: Claude enters the loop. When Ansible fails, Claude doesn’t just fix the code. It reads the error, understands the design, and asks: is this a constraint the design should document, or an incomplete automation step? Or both? Claude proposes improvements to both the design and the automation simultaneously. You validate. Both get updated together.

The loop is closed. Every failure doesn’t just get fixed — it improves the design understanding and the automation together. The system learns. The documentation doesn’t lag because it’s maintained as part of the operational feedback loop, not as an artifact that drifts away.

A New Paradigm: Infrastructure as Reasoning

Most teams are still at step one: Infrastructure as Code where code is the source. Some have progressed to Design-Source IaC where design documents are the configuration source Ansible reads from. That’s real progress — design is authoritative, automation is coherent with it.

But Infrastructure as Reasoning is the next step. It’s a fundamental shift in how infrastructure systems learn and maintain themselves.

The key difference: the system actively reasons about failures.

Traditional approaches: something breaks → fix the code → update docs later (or never).

Design-Source IaC: something breaks → fix the design → Ansible reads the updated design → Opus regenerates docs.

Infrastructure as Reasoning: something breaks → Claude reads the error, researches the root cause, checks the entire design corpus, determines whether this is architectural (design needs to account for it) or operational (automation needs updating), proposes fixes to both simultaneously → you validate → both design and automation improve together.

The system learns. Every failure teaches it something about what the design should account for and what the automation should reflect. Design and automation improve together, not sequentially.

This only works with three things in place:

Design as source of truth — Markdown docs that are authoritative, prescribing what automation executes
AI that can reason — Claude reading the entire corpus, understanding ripple effects, spotting contradictions across design, docs, and automation (Opus’s 1M context is crucial here)
Human design authority — You decide what’s architecturally correct. Claude executes. Neither works alone.

The economics shift: from hiring consultants repeatedly (to re-document after each change) to technical authority + Claude working continuously. The documentation doesn’t go stale because it’s maintained as part of the operational system. The design improves because every failure teaches it something.

The Journey Completes

If you’re still at traditional IaC, your next step is clear: move design into source control as prescriptive documentation that automation reads from. That’s Design-Source IaC. It’s achievable without AI and significantly improves coherence.

If you’ve achieved Design-Source IaC, you have the foundation for the next leap: add Claude into the feedback loop. When failures happen, Claude researches them against the entire design corpus. You validate. Both design and automation improve together.

Ansible failed. I read the error. I asked Claude what it meant.

Claude read the logs, traced the assumption, checked the design docs, figured out whether this was an architectural flaw or an automation gap, and proposed fixes to both. I validated the reasoning. Claude updated the Markdown and the playbook. Opus regenerated the formal documentation.

I re-ran Ansible. It worked.

That’s Infrastructure as Reasoning. Not me chasing my tail between code and docs. Not hiring someone to re-document the system every time it changes. Claude doing the research and the work. Me validating the decisions. Both of us keeping the system coherent.

Infrastructure doesn’t have to degrade as it ages. It can actively learn from failures and improve both its design understanding and its automation together — but only if you have the right layers in place: design as source, AI that can reason about it, and a human with authority deciding what’s architecturally correct.