Daniel sent us this one — and it's a topic that hits differently now than it would have even two years ago. He's asking about Architectural Decision Records, ADRs, a documentation format designed to capture not just what architectural choices were made in a software project, but why they were made, when they were made, and what tradeoffs were on the table at the time. The canonical reference is adr.io, there's a popular lean template called MADR — Markdown Any Decision Records — and the whole concept traces back to a 2011 blog post by Michael Nygard. Daniel's angle is specifically about why this format has taken on new life in the era of LLM-assisted coding and spec-driven development. Because an AI agent has no institutional memory. And ADRs might be one of the more elegant answers to that problem.
I'm Herman Poppleberry, by the way, for anyone new here. And yeah, this one's been sitting in my reading pile for a while. The framing that really got me thinking was this: imagine you're an AI coding assistant, you've been dropped into a codebase mid-project, and someone asks you to extend the authentication layer. You look at the code, you see JWT tokens, you see a particular session handling pattern, and you think — I could refactor this, there's a cleaner approach right here. But what you don't know is that six months ago, the team explicitly considered that cleaner approach, ran benchmarks, found it introduced a latency problem under their specific load profile, and decided against it. That decision is nowhere in the code. It might be in a Slack thread that's been archived. It might be in someone's memory. But it's not in the repository in any structured form.
Which means the AI just helpfully reintroduces the exact problem the team already solved. I love that. Very efficient use of everyone's time.
It's not even the AI's fault, that's the thing. You can't reason about constraints you've never been given. And this is where ADRs become genuinely interesting — not as a bureaucratic documentation exercise, but as a mechanism for making institutional knowledge addressable.
By the way, today's episode is powered by Claude Sonnet four point six, which feels appropriate given what we're about to get into.
Very on-brand. An AI helping explain how to give AI better context. There's something almost recursive about that.
Let's not pull on that thread or we'll be here all week. So — where do you want to start? The format itself, or the problem it's solving?
The problem first, I think, because I don't want people to hear "documentation standard" and immediately check out. The problem is actually fascinating. Software architecture is full of decisions that look wrong in isolation. You look at a codebase and you see something that appears to be over-engineered, or under-engineered, or just weird, and the natural instinct is to fix it. But a lot of those apparent mistakes are load-bearing constraints. They exist because of a regulatory requirement, or a performance characteristic, or a dependency on a third-party system that has since been deprecated. The code doesn't tell you any of that. The code tells you what is happening, not why it was chosen over the alternatives.
The older the codebase, the worse this gets. Because the people who made those decisions have moved on, or they've forgotten the specifics, or the original context has shifted so much that even they can't fully reconstruct the reasoning.
And there's a term for this — some people call it "knowledge vaporization." The decision gets made, the reasoning exists briefly in someone's head and maybe in a meeting, and then it just... What you're left with is the artifact of the decision without the rationale.
Which is a very polite way of saying future developers are left guessing.
And the guessing is expensive. Not just in time — though it is expensive in time — but in risk. Because the guess might be wrong in ways that don't surface immediately. There's actually a classic example of this in the space shuttle software development story. The original Shuttle avionics code had certain constraints baked in that looked completely arbitrary to engineers who joined the program later. Some of those constraints existed because of hardware characteristics from the early seventies that had long since been superseded. But because the reasoning was never formally recorded, nobody was confident enough to remove them. You end up with code that's carrying dead weight because the cost of being wrong is too high to risk the guess.
That's a great illustration of the asymmetry. The cost of writing down the reasoning is low. The cost of not having it, compounded over years, can be enormous.
The thing is, it's not just legacy systems. This happens fast. Teams move quickly, decisions get made in Slack, the reasoning lives in a thread for three months before it's archived, and then it's effectively gone. The half-life of institutional memory is surprisingly short even at healthy, well-run teams.
The fix has to be something concrete, not just a vague "be more careful.
Right — but "write down why you made the decision" sounds almost insultingly simple. What does the fix actually look like in practice?
It is simple, which I think is part of why it took so long to catch on. Michael Nygard's original framing from 2011 was basically: keep a collection of records for each architecturally significant decision. Each record covers the context — what situation prompted this choice — the decision itself, and the consequences, both positive and negative. That's the core. You're not writing a design document. You're not producing a formal specification. You're writing something closer to a structured journal entry, made at the moment the decision is live, when the reasoning is freshest.
The moment part matters, I assume. Because if you're reconstructing the rationale six months later, you're already doing archaeology.
And the format has been standardized enough that there's now a whole ecosystem around it — adr.io is the canonical reference point, and MADR, the Markdown Any Decision Records template, has become the lean default for a lot of teams. Over two thousand four hundred stars on GitHub as of this month, which for a documentation template is significant.
That's not a niche curiosity number. That's people actually using it.
It's adoption. And the key thing MADR gets right is that it resists bloat. The temptation with any documentation format is to keep adding fields until you've recreated the bureaucracy you were trying to escape. MADR holds the line. Title, status, context, decision, consequences. You can add alternatives considered, you can add references, but the minimum viable ADR is minimal.
The format solves the "what to write" problem. The harder question is probably the "when does this warrant an ADR" problem.
Which is exactly where we should go next.
The "when" question is the one that actually trips teams up, I think. Because the format is clear, the rationale for having it is clear, but if you set the bar too low you end up documenting every variable name choice, and if you set it too high you only write ADRs for the decisions that were already obvious enough to survive without one.
There's a useful heuristic I've seen float around, which is: if the decision would generate meaningful disagreement among senior engineers, it warrants an ADR. Not because disagreement means the decision is wrong, but because disagreement is a signal that the context is doing real work. The choice isn't obvious from first principles. Someone reasonable could look at the same situation and land differently. That's exactly when you need the record.
I like that framing. It also implicitly captures things like infrastructure choices, third-party dependencies, API design patterns — decisions where the tradeoffs are real and the alternatives aren't obviously worse.
Right, as opposed to "we used camelCase for variable names." Nobody's writing an ADR for that.
Although, and I say this only half-jokingly, I have seen teams spend forty-five minutes arguing about camelCase versus snake_case in a pull request. So maybe the bar is lower than we think.
And the context section is where most of the value actually lives, isn't it. Not the decision itself.
By a wide margin. Think about it this way — the decision is often just a sentence. "We chose PostgreSQL over MongoDB.But the context is where you learn that the team evaluated this under a specific read-heavy load profile, that the data relationships were complex enough that document-store flexibility wasn't actually buying them anything, that they ran a benchmark at ten thousand concurrent reads and the latency difference was meaningful. That's the information that makes the decision legible. Without it, you just have a preference.
With it, you have something a new engineer — or an AI agent — can actually reason about. Because now you know not just what was chosen but under what conditions that choice holds.
Which matters enormously when conditions change. If your load profile shifts, if the data model evolves, if a new version of the technology closes the performance gap — the ADR tells you exactly which assumptions to revisit. It becomes a checklist for when to reopen the decision.
That's the part that doesn't get talked about enough. ADRs aren't just archaeology. They're decision hygiene going forward.
The consequences section does some of that work explicitly. A good ADR names the tradeoffs you accepted. Not just "this was better," but "this was better along these dimensions, and we paid this cost." A team switching database technologies, say, would document not just the performance benchmarks that favored the new system, but the migration costs, the tooling changes, the learning curve, the operational overhead. All of that goes in consequences.
When someone two years later is wondering why the migration happened, they're not guessing at the business case — it's right there, with the numbers.
And this is where ADRs differ from traditional architecture documentation. A design document describes the system as it is. An ADR describes a moment of choice. It's temporally anchored in a way that most documentation isn't. You're reading a snapshot of the team's reasoning at a specific point in time, with a specific set of constraints and alternatives on the table.
Which also means it ages differently. A design doc becomes outdated when the system changes. An ADR is never outdated, because it was always about that moment.
That's a really clean way to put it. The status field handles the evolution — an ADR can be marked as proposed, accepted, deprecated, or superseded. So the record doesn't disappear when the decision changes, it just gets a new status and a pointer to whatever replaced it. You end up with a chain of reasoning rather than a single document trying to represent a moving target.
That chain is exactly what makes it machine-readable in a useful way. It's not just narrative — it's structured enough that you can traverse it. Which, incidentally, is exactly what an AI agent needs.
Right — not a wall of prose, but a traversable graph of decisions. Because the problem with feeding an LLM a long design document is that it's undifferentiated. Everything has equal weight. ADRs give you structure that an agent can actually navigate. "Show me all decisions with status superseded." "Show me every ADR that touches the authentication layer." That's a query. You can't query a wiki page.
How does that work in practice, though? Are teams literally building tooling on top of their ADR directories, or is this more of a manual "paste the relevant ones into the prompt" situation right now?
Mostly the latter, honestly. The tooling is early. There are some CLI utilities — adr-tools is the most widely used — that help you create and link records, and a few teams have built lightweight scripts that pull ADRs tagged with certain keywords into their context windows automatically. But the sophisticated retrieval layer, where an agent can semantically search across your decision history and surface the three most relevant ADRs for a given task — that's not really productized yet. It's more of a "the primitives exist, someone will build this" situation.
Which means right now the value is still primarily human-facing, with the AI benefit being more about what you can deliberately include in a prompt.
For most teams, yes. But even that manual version is a meaningful upgrade over the alternative, which is the AI having no context at all. And the teams doing it deliberately — actually maintaining a docs/decisions directory and referencing it when they spin up an agent — are reporting noticeably fewer of those "the AI just undid six months of work" moments.
Commit messages are the obvious comparison here. People have been saying for years that commit messages should capture the why, not just the what. And they're right, but commits are granular. They're tied to a diff. An ADR is tied to a decision, which might span dozens of commits over weeks.
The granularity mismatch is real. A commit message can tell you that a particular function was refactored on a particular date. It can't tell you that the entire session handling architecture was reconsidered because of a compliance requirement, and that three alternatives were evaluated before landing on the current approach. Those are different levels of abstraction, and conflating them is part of why commit history archaeology is so unreliable.
Commits are the what-happened log, and ADRs are the why-we-went-this-direction log. They're complementary, not redundant.
And version-controlling ADRs alongside the code is what makes them durable. They live in the repository. They go through pull requests. They have a review history. A new engineer cloning the repo gets the decision history automatically — they don't have to know which Confluence page to look for, or whether the relevant Slack thread has been archived.
That's the onboarding angle that I think gets undersold. The first week on a new codebase is mostly just trying to figure out which things are intentional and which things are accidents. ADRs collapse that timeline considerably.
There's been some interesting work on this in the context of AI-assisted development specifically. There was a paper on arXiv this month looking at context strategies for automated ADR generation, and one of the findings was that feeding an LLM three to five prior ADRs from the same project significantly improved the quality of newly generated drafts. The model picks up on the team's reasoning style, their level of technical specificity, the kinds of tradeoffs they tend to name explicitly.
The ADRs teach the AI how the team thinks, not just what they decided.
Which is a knock-on effect that I don't think most teams anticipate when they start writing ADRs. You're not just documenting for future humans — you're building a corpus that makes AI assistance progressively more calibrated to your specific context.
It's almost like a fine-tuning signal, except without the fine-tuning. You're shaping the model's outputs through the examples you put in the context window rather than through the weights.
In-context learning, yeah. And it compounds. The more ADRs you have, the more consistent the AI's suggestions become with your team's actual preferences and constraints. Early on, when you have three ADRs, the effect is marginal. When you have thirty, the model has a real sense of how your team reasons about tradeoffs.
The persistence angle matters here too. LLM sessions are stateless by default. Every new conversation starts from scratch. If your architectural context lives in ADRs that are version-controlled and referenceable, you can include the relevant ones in the prompt and the agent is immediately operating with institutional memory it wouldn't otherwise have.
The alternative is what some people are calling "context-free generation" — where the AI produces technically correct code that violates the team's actual constraints because it has no way of knowing those constraints exist. The ADR is the mechanism for closing that gap.
Stateless tool, stateful documentation. The documentation does the work the tool can't.
It scales in a way that verbal onboarding doesn't. You can tell a new hire about the important architectural decisions in their first week, but you'll miss things, and they'll forget things, and six months later they're making the same mistakes anyway. An ADR collection is exhaustive and searchable in a way that human memory isn't.
There's something almost ironic about the fact that the solution to AI's memory problem is just... good documentation practice that humans should have been doing anyway.
The AI pressure is just making the cost of not doing it visible. Teams have been getting away with undocumented architecture for years because the humans muddled through. Now there's an agent in the loop that can't muddle — it either has the context or it doesn't. And that gap is where teams are feeling the pain.
That pain point makes sense, but what does it actually look like in practice? I think a lot of listeners are nodding along and then going to open their editor Monday morning without knowing where to start.
Start with MADR., just clone the template and fill in the five fields for the next decision your team is actively debating. Don't retroactively document everything — that's the trap. You'll spend a week writing ADRs for decisions nobody remembers clearly, the quality will be poor, and you'll convince yourself the format doesn't work. Pick the next live decision and write the ADR while the context is still in everyone's heads.
The freshness constraint is doing real work there. An ADR written during the decision captures things that are gone a month later. The rejected alternatives, the specific numbers that swayed the call, the constraint that turned out to be decisive.
The minimum viable version is small. A good MADR can be two hundred words. It doesn't need to be a treatise. Title, status — mark it accepted — a paragraph of context, one sentence of decision, a short list of consequences. You're done. The goal is the habit, not the artifact.
What about teams that feel like they've missed the boat? Like, they've got a five-year-old codebase and the people who made the original decisions have mostly left. Is there a recovery path, or are they just starting from zero?
There's a middle path that works reasonably well, which is what some people call "decision archaeology lite." You're not trying to reconstruct every historical choice with full fidelity — that's the trap I mentioned. Instead, you pick the ten or fifteen decisions that are most actively causing confusion right now. The things your team keeps re-litigating. The parts of the codebase where new engineers consistently make the same wrong assumption. You write ADRs for those, even if the context section is partly reconstructed and marked as such. Something like "based on git history and interviews with the original team, the likely context was..." is better than nothing.
Because even an imperfect record stops the re-litigation cycle.
And it seeds the habit. Once the team has written a few, the format stops feeling like overhead and starts feeling like the natural end state of a decision conversation. "Okay, we've agreed on this — someone write the ADR" becomes a normal close to a technical discussion rather than an extra chore.
The other thing worth naming is that ADRs aren't write-once documents. The status field exists for a reason. When a decision gets revisited, you update the status to superseded, you write a new ADR pointing back to the old one, and the chain stays intact. You're not editing history — you're extending it.
Which is a discipline shift for teams used to just updating a wiki page in place and losing the prior reasoning. The ADR model says the old record stays, permanently, and the new one explains why things changed.
If you want to wire this into your workflow rather than relying on discipline alone, the CI/CD angle is worth considering. Some teams add a check that flags pull requests touching architectural layers without a corresponding ADR update. It's not foolproof, but it closes the gap between code moving and documentation lagging.
The documentation-as-code framing. If the ADR lives in the repo and the pipeline knows about it, you've made neglecting it slightly harder than maintaining it. That's about as good as process enforcement gets — which makes me wonder: where does this fifteen-year-old format go from here?
Yeah, that's exactly what I was thinking. It's suddenly relevant again, but in a completely different way than Nygard originally intended.
The honest answer is I think ADRs are going to get more structured, not less. Right now the format is deliberately loose — MADR gives you fields, but what you put in them is freeform prose. As AI tooling matures, I'd expect to see schemas emerge. Machine-readable consequence types. Standardized tradeoff taxonomies. The kind of thing that lets an agent not just read an ADR but reason across a collection of them.
Which risks killing what makes them useful. The moment you turn an ADR into a form with dropdown menus, you've lost the reasoning. You've got data entry instead of documentation.
That's the tension. Structured enough to be queryable, expressive enough to capture nuance. I don't think that's unsolvable, but it's not solved yet. And I think the teams that figure it out — that find the right level of schema without collapsing into bureaucracy — those are the teams that get the most leverage from AI-assisted development over the next few years.
It's a bit like the difference between a well-maintained type system and an over-engineered one. The types that are too rigid end up getting worked around. The ones that are just structured enough actually get used.
The ADR that's too prescriptive gets skipped. The one that's just structured enough becomes load-bearing. And load-bearing is exactly what you want — you want the documentation to be the thing the team reaches for, not the thing they route around.
ADRs as the connective tissue between how humans make decisions and how AI agents execute them. That's a interesting role for a documentation format to play.
It's almost a new category. Not documentation in the traditional sense. persistent institutional cognition.
I'm going to let that one sit. Good place to leave it.
Thanks to Hilbert Flumingtop for producing, and a quick thanks to Modal for the serverless GPU infrastructure keeping this whole operation running. This has been My Weird Prompts — if you've been getting value from the show, a review on Spotify goes a long way.
We'll see you next time.