So Daniel sent us a voice note, and it's a meaty one. He's been thinking about spec-driven development — the workflow that's transformed agentic AI from "this sometimes works" to "this reliably works" — and he's asking whether we can reverse-engineer it. Work backward from what makes AI agents productive to what might make humans more productive. Big projects, overwhelm, the paralysis of staring at a goal so large you can't find the first step. He wants to know what productivity tradition this maps onto, and how you'd actually implement it outside of code generation. Buying a house, changing careers, moving cities — that kind of scale.
And it's a genuinely underexplored angle. Most of the conversation about agentic AI workflow goes in one direction — how do we get humans to manage AI better? Daniel's flipping it. What can the constraints that make AI work teach us about making humans work?
Before we get into it — quick note, today's script is being generated by Claude Sonnet four point six, which feels appropriate given we're about to spend twenty-five minutes talking about how Claude Code's planning mode might be the secret to buying a house.
There's something almost recursive about that, which I appreciate. Okay, so let me set up the technical side first, because I think the details actually matter here — they're not just background.
Go for it.
Spec-driven development in the context of agentic AI, and specifically Claude Code, starts from a recognition that large language models have a context window problem. You cannot hand a model a massive project and expect coherent execution across hundreds of steps, because by the time you're on step eighty, the model has effectively forgotten what it decided in step three. The solution that emerged — and this happened pretty organically across teams working with Claude Code, Cursor, and similar tools — was to externalize memory. You write a plan. A real document. A spec. And that spec becomes the persistent source of truth that the agent refers back to at each step rather than relying on its in-context state.
So the spec is doing the job that the model's memory can't do.
Right. And Claude Code made this explicit with planning mode — a distinct phase where the model is instructed not to execute anything, just to think. You describe the project, you go back and forth, you poke holes, you refine, and only when you've got something solid do you flip into execution mode. The spec sits there the whole time. When you hit a wall, you don't just patch around it — you go back to the spec, update it to reflect what you've learned, and you're now working from a version two that's smarter than version one.
And the micro-task chunking comes from the same constraint.
Completely. Because even with a spec, you can't hand an agent a task that spans a thousand decisions. So you break it down. Each chunk is small enough that the agent can hold the full context of that chunk in its window, execute reliably, hand off a result, and the next chunk picks up from there. The sub-agent model is basically a formalization of that — different agents handle different chunks, they pass state between each other, and the spec keeps them all oriented to the same goal.
So the context window is the villain here. Or maybe the hero, depending on how you look at it, because it forced a discipline that arguably produces better outcomes than just letting the model run wild.
That's a really interesting reframe. The constraint created the best practice. And I think Daniel's intuition is that the best practice might be worth keeping even when the constraint doesn't apply — which is the case for humans, who don't have a hard context window in the same way, but who absolutely do get overwhelmed when a project is too large and underspecified.
I want to stay on that for a second, because I think there's something worth naming. The failure mode for humans working on big projects isn't usually that we forget what we decided. It's that we never clearly decide in the first place. The spec is absent from the start.
That's the crux of it. And this is where the productivity literature has actually been circling this idea for decades without quite articulating it in these terms. The closest intellectual ancestor is probably Getting Things Done — David Allen's system from two thousand and one. The core insight in GTD is that your brain is terrible at holding open loops. Every unresolved project, every vague intention, every "I should really do something about X" — that all sits in working memory and creates what Allen calls psychic RAM drain. The solution is to capture everything externally, clarify what the actual next action is, and trust the system rather than your head.
Which is essentially what the spec does for the AI. Externalize the state so the agent doesn't have to hold it.
The parallel is almost uncomfortably direct. But GTD has a weakness, and I think it's relevant here. GTD is brilliant at the task level — it's very good at "what's the next physical action?" But it's less good at the project architecture level. If you've got a goal like "buy a house," GTD tells you to capture it, clarify the next action — maybe "call three mortgage brokers" — and put it in your system. What it doesn't give you is a structured way to think through the full shape of the project before you start executing.
So it's good at the leaves but not great at the tree.
Exactly — well, I mean, that's the right way to put it. And this is where spec-driven development has something genuinely new to add, because the planning phase is explicitly upstream of execution. You're not allowed to do anything until you've thought it through. GTD doesn't enforce that. In fact, GTD can sometimes encourage premature action because the system rewards checking things off.
I've seen that. The person who has a beautifully organized task list and is very busy and somehow never makes progress on the thing that actually matters.
Because the task list is full of next actions that feel productive but don't move the project. And the project was never properly specced in the first place, so there's no way to know which actions are actually load-bearing.
What about Objectives and Key Results? That's another framework that gets thrown at big projects.
OKRs are interesting because they operate at a higher altitude. An OKR says: here's the objective, here are three to five measurable results that would tell us we've achieved it. That's actually pretty close to the spec's role in defining success criteria. The problem with OKRs as a personal productivity tool — as opposed to a team management tool — is that they don't really help you with the execution structure. They tell you where you're going but not how to chunk the journey.
So the spec-driven approach is sort of filling a gap between OKRs at the top and GTD at the bottom.
That's a really clean way to describe it. The spec is the bridge layer. It's more granular than "I want to buy a house in the next twelve months" and more architectural than "call three mortgage brokers by Friday." It says: here are the phases of this project, here are the dependencies between phases, here are the decision points where I'll need to pause and reassess, and here is what done looks like at each stage.
I want to bring in Daniel's personal note here because I think it's doing real work in the question. He said he struggles enormously when he has too many big projects running simultaneously, and he's great at executing but gets overwhelmed when the task is too undefined. That's a very specific failure mode. It's not laziness, it's not disorganization — it's the gap between goal and first step.
And that gap is where most projects die. The goal is clear enough — "I want to move to a bigger apartment" — but the distance between that and any concrete action feels so vast that the brain just... refuses to engage. There's actually a concept in psychology called the planning fallacy's ugly cousin, which is goal-level paralysis. The goal is too abstract to generate action, but the person hasn't done the work to translate the goal into a structure that does generate action.
So the spec is the translation work.
The spec is the translation work. And here's the thing about how Claude Code does it that I think is instructive for the human case — the planning conversation is not a solo exercise. You're talking to the model. You're saying "I want to build this thing" and the model is asking you questions, surfacing things you hadn't thought about, pointing out dependencies you'd missed. The output is better than what you'd produce alone because the back-and-forth forces you to be explicit about things you'd otherwise leave vague.
Which is interesting because it implies the human equivalent isn't just "write a plan in a notebook." It's "have a structured conversation about the plan before you write it."
And this is where I think the emerging reality of personal AI agents becomes genuinely exciting rather than just hypothetical. Because right now, if you want to do spec-driven development for buying a house, you could do it with a general-purpose AI. You sit down with Claude or whatever you're using and you say — here's my goal, here's my timeline, here are my constraints, let's build a spec. And you go back and forth until you have something real. That's available today.
Let's actually walk through what that looks like, because I think it's easy to nod at this abstractly and not really understand what the spec contains.
Okay, let's use the house example. Stage one of the planning conversation is scope and constraints. What's the goal in concrete terms? Not "buy a house" but — what size, what location, what price range, what timeline, what are the non-negotiables versus the nice-to-haves? This sounds obvious but most people skip it because it feels like they already know. They don't. They have a vague feeling, not a spec.
And the AI is useful here precisely because it asks the questions you'd skip if you were writing alone.
Stage two is dependency mapping. What has to happen before other things can happen? You can't make an offer before you have mortgage pre-approval. You can't get pre-approval before you know your credit score and income documentation is in order. You can't view houses productively before you've defined your search criteria. These dependencies create a sequence, and the sequence tells you what the first actual chunk of work is — which is almost never what people think it is.
The first chunk is rarely the exciting stuff.
Almost never. Stage three is identifying the decision points — the moments in the project where you'll need to pause, gather new information, and potentially update the spec. In a house purchase, one of those is after your first round of viewings, when you discover that your criteria need adjustment because reality doesn't match what you imagined. That's a spec update. You don't abandon the project, you update the document and continue from a smarter version.
This is the version two moment. And I think this is actually the part of the framework that most productivity systems handle worst. They treat the plan as either fixed — you made a plan, stick to it — or they don't have a plan at all. The idea that the plan is a living document that you deliberately update at predetermined checkpoints is different.
It's different and it's important. Because the alternative is either rigidity — "I said I'd buy a three-bedroom and I'm buying a three-bedroom even though I've now seen the market and understand that's not realistic" — or drift, where the project just gradually loses coherence because every small deviation from the original intention was never formally reconciled.
And drift is the more common failure mode for self-directed projects.
By a lot. Stage four is chunking. Once you've got the dependency map and the decision points, you break the project into chunks that are each small enough to be executable in a defined period — a week, two weeks, whatever fits your life. Each chunk has a clear deliverable. Not "work on mortgage stuff" but "have pre-approval letter from at least two lenders in hand." The deliverable is specific enough that you know unambiguously when you've done it.
The specificity is load-bearing. "Work on mortgage stuff" is just a vague intention wearing a task's clothing.
And this is exactly what the sub-agent model solves in the AI context. Each sub-agent gets a well-defined task with clear inputs and expected outputs. The ambiguity is resolved at the spec level, not delegated to the agent to figure out on the fly. For humans, the equivalent is resolving the ambiguity in the planning phase so that when you sit down to execute, you're not also having to figure out what you're doing.
There's a cognitive load argument here too, isn't there? Because if you're simultaneously figuring out what to do and doing it, you're using the same mental resources for two different jobs.
David Marquet, the submarine commander who wrote Turn the Ship Around, talks about this in a slightly different context — the cost of switching between thinking and doing. The planning phase and the execution phase use genuinely different cognitive modes. Trying to do them simultaneously is like trying to write and edit a sentence at the same time. You get worse versions of both.
And the spec creates a clean handoff between the two modes. You've done the thinking. Now you're just executing.
Which is liberating, actually. Because when you're in execution mode with a good spec, you're not second-guessing yourself at every step. The second-guessing happened in the planning phase, where it was cheap. Now it's expensive because you're in motion, and the spec gives you permission to just move.
I want to ask about the overwhelm piece specifically, because Daniel named it directly. He said he gets overwhelmed when there are too many big projects running simultaneously. And I think the spec-driven approach addresses this in a way that's worth making explicit.
The overwhelm of multiple simultaneous big projects is almost always an information problem masquerading as a capacity problem. It feels like too much to do, but the real issue is that all of these projects are living as vague, unresolved intentions in working memory, each one pulling at attention without being actionable. The spec externalizes them. Once a project has a proper spec and a clear next chunk, it stops consuming background mental resources because your brain trusts that it's captured and structured. You don't need to keep thinking about it.
It's Allen's open loop idea but applied at the project architecture level rather than the task level.
Right. And when you've got five big projects, the spec-driven approach doesn't make them smaller — it makes them navigable. You know exactly where each one is, what the next chunk is, and when you'll revisit the spec. You can close the mental tab on the four you're not working on today.
Let's talk about the personal agent piece, because I think Daniel's right that this is coming fast and it changes the calculus.
The current state is that you can use a general AI assistant as a thinking partner for the planning phase — and that's already genuinely useful. But what's emerging, and I think we're probably twelve to eighteen months from this being a mainstream experience, is AI that maintains persistent context about your projects across sessions. So the spec isn't just a document you wrote once — it's something your agent knows, updates, and uses to proactively surface what your next chunk should be given your current state.
That's a meaningful jump. Because right now, even if you write a beautiful spec, you still have to remember to look at it.
And most people don't. The spec degrades into a document you wrote that one time and haven't opened since. The agent changes that because it's the entity maintaining the spec, tracking progress against it, flagging when a decision point is approaching, and asking you at the start of a session — here's where we are, here's what the plan says, do you want to update anything before we start executing?
So the agent is doing what a really good executive assistant would do. Holding the project state so you don't have to.
And doing it without the social complexity of managing a human assistant. Which is not nothing — a lot of people who would benefit from that kind of support don't have access to it, or find the management overhead too high.
There's a democratization argument here that I find genuinely compelling. Spec-driven project management with an AI partner is available to anyone right now at very low cost. The kind of structured planning support that used to require a project manager or a coach or a very organized friend is now just... a conversation you can have.
And the quality of the spec you produce in that conversation is surprisingly high, because the AI has processed a lot of material about project management, common failure modes, dependency structures — it's bringing that to the table even if you haven't read a single productivity book.
What would you say to someone who's skeptical? Who says, look, I've tried productivity systems before, I've written plans before, they always fall apart within two weeks, why is this different?
That's a fair challenge. And I think the honest answer is that the spec-driven approach has a few structural features that make it more durable than a typical plan. First, it's built around explicit decision points rather than a fixed trajectory — so when reality diverges from the plan, you have a sanctioned moment to update rather than just watching the plan become irrelevant. Second, the chunking means you're never more than one chunk away from visible progress, which matters for motivation. Third, and this is underrated — the planning conversation itself creates a kind of commitment that writing alone doesn't. When you've spent an hour talking through a project, articulating it, defending your assumptions, you have more psychological investment in the spec.
The process of building the spec is part of what makes you believe in it.
There's also a meta-skill here. The first time you do this, it's awkward. You don't know how granular to be, you're not sure where to draw the chunk boundaries, the spec feels either too vague or too detailed. But it's a learnable skill, and it gets better fast. By the third or fourth project you've specced out, you have a feel for it.
What's the practical entry point for someone who wants to start doing this but doesn't know where to begin?
I'd say pick one project — not the most important one, not the most overwhelming one, but one where you've been stuck and you care about the outcome. And before you do anything else on that project, spend thirty minutes having a planning conversation with an AI. Not asking it to write a to-do list for you — actually talking through the project. What's the goal in concrete terms? What are the constraints? What has to happen before other things can happen? What are the moments where you'll need to stop and reassess? Let it ask you questions. Push back on your assumptions. At the end, ask it to summarize the spec in a format you can save and refer back to.
And then actually refer back to it.
That's the part that requires discipline. But if you do it — if you look at the spec at the start of each work session on that project and ask yourself "am I working on the right chunk right now?" — you will notice a difference in how much you accomplish versus how busy you feel.
I think that distinction is worth sitting with. Busy versus accomplishing. Most people who feel overwhelmed by big projects are not idle. They're doing things. They're just not doing the things that move the project, because the project was never structured well enough to make it clear which things those are.
And that's the fundamental promise of the spec. Not that you'll work harder or longer, but that the work you do will be load-bearing.
Alright, practical takeaways. Give me the version someone can actually implement this week.
First: identify one big project that's been sitting as a vague intention. Not a task, a project — something with multiple phases and a timeline of weeks or months. Second: before touching any execution on that project, have a thirty-minute planning conversation with an AI. Use it to produce a written spec — phases, dependencies, decision points, chunked deliverables. Third: save the spec somewhere you'll actually see it. Not buried in a folder, somewhere that's part of your regular workflow. Fourth: for the next four weeks, start each session on that project by looking at the spec, identifying which chunk you're in, and working only on that chunk. Fifth: at the end of each chunk, before moving to the next one, do a five-minute spec review. Is the plan still accurate? Does anything need to be updated given what you've learned?
And the sixth one, which you didn't say but I'll add: don't do this for every project simultaneously. Start with one. The skill compounds, but it also has an on-ramp.
That's an important addition. The overhead of spec-building is real, and if you try to spec every project in your life at once, you'll end up with a bunch of half-built specs and no execution. One project, done properly, will teach you more than five projects done partially.
There's something almost ironic about the fact that the best practices for managing AI agents turn out to map so cleanly onto what thoughtful people have been saying about human productivity for years. Allen, the OKR folks, Marquet — they were all circling this. The AI context just made the underlying logic inescapable because the failure modes are so visible when a model goes off the rails.
And I think that's actually the most interesting thing Daniel's prompt surfaces. We've been treating AI as a tool we need to learn to manage. But the discipline of managing AI well might be teaching us something about how minds — artificial or biological — handle complexity. The context window isn't just a hardware limitation. It's a metaphor for the real cognitive constraints that have always existed for humans working on large, ambiguous projects. The solutions that work for AI work for humans because they're solving the same underlying problem.
Which is: how do you maintain coherent intention across a long, complex, multi-phase project when the thing that's executing — whether it's a language model or a person — can only really attend to a small slice of the whole at any given moment.
And the answer is: you externalize the architecture. You write the spec. You chunk the work. You update the plan at defined moments rather than letting it drift. You separate thinking from doing. None of that is new. But seeing it work so visibly in the AI context might be what finally makes it stick for people who've heard the advice a hundred times and never quite committed to it.
I'm going to go write a spec for my napping schedule. Finally bring some structure to the chaos.
You already have the most optimized napping workflow of anyone I've ever met.
Thank you. I call it ancestral sloth methodology. Predates GTD by about thirty million years.
I believe zero percent of that.
And yet here we are. Alright — thanks to Hilbert Flumingtop for producing this one. And a quick thanks to Modal for keeping our pipeline running smoothly — serverless GPU infrastructure that honestly just works, which is all you want from infrastructure. This has been My Weird Prompts. If you want to find all two thousand one hundred and forty-six episodes, head to myweirdprompts.com — and while you're there, leaving a review genuinely helps more people find the show.
Until next time.