So Daniel sent us this one, and it's very much in his wheelhouse. He's asking about agentic AI consulting — specifically, how external builders are supposed to scope and price these projects when the systems themselves are fundamentally non-deterministic. He wants to know what emerging patterns exist for setting clear milestones and objectives, and how consultants avoid the nightmare scenario of runaway scope. And given that Gartner is predicting nearly half of all agentic AI projects will be scrapped by end of next year, the stakes here are pretty real.
Herman Poppleberry, by the way, for anyone just joining us. And yeah, this topic has been sitting in the back of my mind for a while because the structural problem Daniel is pointing at is genuinely novel. It's not just "consulting is hard." The specific challenge with agentic AI is that you're being asked to deliver fixed-price certainty on top of a system that, by design, operates in uncertainty. That tension is baked into the architecture.
Before we get into the frameworks, let's just make sure we're clear on why this is different from, say, scoping a regular software project. Because on the surface it sounds like the same old story — client wants everything, consultant has to draw a line.
The surface similarity is exactly what makes it dangerous. With traditional software, if a client asks you to add a feature, you can estimate it. Two days, five days, whatever. The execution is deterministic. With an agentic system, you're not just writing code — you're designing a decision-making process that will encounter situations you didn't anticipate, use tokens you didn't budget for, and potentially take actions you didn't explicitly authorize. And here's the part that catches people off guard: because generating code with these agents is now essentially free, there is practically nothing stopping an agent from pursuing every avenue that would previously have been cost-prohibitive. Every "and now can you just...?" from the client gets executed immediately. The execution is free. The maintenance is not.
Wes McKinney wrote about this in February, right? The Mythical Agent-Month.
He did, and it's worth dwelling on because the essay is a bit of a landmark for this space. McKinney — who created pandas and knows a thing or two about large-scale software complexity — directly applies Fred Brooks' 1975 concept of the Mythical Man-Month to agentic development. And his core provocation is this: when generating code is free, knowing when to say no is your last defense. The bottleneck in agentic AI delivery is no longer execution speed. It's design taste and product scoping.
Which is a profound reframe for how consultants should think about their own value. You're not being paid to type faster anymore.
And that's the thing most consultants in this space haven't fully internalized yet. McKinney describes what he calls the agentic tar pit — parallel agent sessions locked in combat with code bloat and incidental complexity that the agents themselves generated. He also flags what he calls the brownfield barrier: agentic codebases become increasingly unmanageable past around a hundred thousand lines of code. Every new change has to hack through the jungle created by prior agent sessions. That's technical debt accruing at machine speed, not human speed.
So the client says "can you just add this one thing" and the agent says "sure" and three hours later you have forty thousand new lines of code that nobody fully understands.
And that's not a hypothetical. McKinney is describing this as something he's actively observing. He also raises a really uncomfortable question about Conway's Law — the idea that software architecture mirrors the communication structure of the team that built it. What does Conway's Law look like when your team has no persistent memory and no shared understanding of the system they're building? Different agent sessions can produce contradictory architectural decisions that a human has to reconcile after the fact. And that reconciliation cost doesn't show up in anyone's project estimate.
Okay, so the problem is clear and it's genuinely scary. What are consultants actually doing about it? Because there has to be some emerging structure here.
There is, and it coalesces around a few key ideas. The first and probably most important is the discovery sprint, sometimes called an exploration milestone. This is not a new concept in software consulting, but the agentic context makes it more critical, not less, because the unknowns are different. In traditional software you're mostly figuring out requirements and technical constraints. In agentic AI you're also trying to answer: is the client's data actually good enough to support this? Can they even define what success looks like? Is the use case one where an autonomous agent is appropriate at all?
And that last question is surprisingly often "no."
More often than clients expect. The Stack.expert consulting framework, which came out last October, identifies the discovery sprint as the single most important scope protection tool for this category of work. The pricing guidance they put out is interesting: you're looking at five thousand to twenty thousand dollars for small-to-medium projects, which should represent roughly fifteen to twenty-five percent of the anticipated full project value. It runs two to three weeks, and the deliverable is not a vague "we looked at your situation" document — it's a data quality assessment, technical feasibility testing with small working prototypes, stakeholder interviews, and a detailed roadmap grounded in actual findings rather than optimistic assumptions.
And crucially, it ends with a go or no-go recommendation. Which means the consultant has to be willing to tell the client their project shouldn't proceed.
Which is where a lot of consultants lose their nerve. But this is exactly the point. If you price the discovery sprint correctly — high enough to be profitable as a standalone engagement — then recommending a no-go is not a financial catastrophe. You got paid for strategic thinking. The client got clarity before committing the larger budget. If they proceed, the exploration fee gets credited toward the total. It's a genuinely elegant structure because it aligns incentives correctly.
By the way, today's script is powered by Claude Sonnet four point six, which feels appropriate given we're talking about AI systems and the people trying to charge money to build them.
There's something very recursive about that.
Very. So after the discovery sprint, what does the engagement structure actually look like?
The framework that's emerged from multiple practitioners converges on a three-phase structure. Phase one is exploration and prototyping — feasibility work, architecture decisions, a small working prototype. Typical price range ten thousand to twenty-five thousand dollars. Phase two is what they're calling the Minimum Viable Agent, or MVA — core agent functionality, key integrations, the thing that actually works in a constrained scope. That runs twenty-five thousand to seventy-five thousand. Phase three is production deployment and optimization — hardening the system, setting up monitoring, establishing service level agreements. That's thirty-five thousand to a hundred thousand plus.
And each phase has defined success criteria and natural exit points.
That's the critical design principle. Each phase has to have a point where both parties can look at what was built and decide whether continuing makes sense. This is what prevents the agentic tar pit from swallowing the project. You're not writing a contract for "build us an autonomous agent that handles our entire customer service operation." You're writing a contract for phase one, with explicit criteria for what phase one success looks like, and a separate agreement for phase two contingent on those criteria being met.
The Minimum Viable Agent concept is doing a lot of heavy lifting here. Can you unpack what actually goes into defining one?
The MVA framework from Invimatic — their January 2026 planning guide is one of the cleaner articulations of this — has seven components. You need to define the objective, which is the specific business problem the agent solves. Not "automate the department," but something like "eliminate the manual step of pulling daily sales figures from three different systems and formatting them into the morning report." You need to define who actually interacts with or benefits from the agent. You need to define the trigger event — what initiates the workflow. You need to specify agent abilities, meaning what can it read, write, decide, or execute, and crucially what can it not do. Then success metrics, dependencies, and constraints.
The constraints piece is interesting. Because I'd guess most clients want to skip that part.
Every client wants to skip that part. But constraints are where you define the scope boundary. Cost limits, latency requirements, security restrictions, accuracy thresholds — these are not optional additions to the spec. They are the spec. An agent that has no defined cost ceiling will find a way to spend everything. An agent that has no defined accuracy floor will happily produce confident-sounding wrong answers.
And there's a philosophy underneath the MVA concept that's worth naming, which is the crawl, walk, run, fly progression.
OrangeMantra articulates this well in their ROI guide. Start with a single hyper-specific task. Not "automate the whole department" — automate the one thirty-minute daily reporting task that everyone hates and that has a very clear definition of done. You're looking for a quick, undeniable win. Then you connect a few tasks into a simple workflow. Then you expand to handle a full business process with some decision-making autonomy but within a well-defined sandbox with human oversight on critical steps. End-to-end autonomy — the "fly" stage — only becomes achievable after you've successfully navigated the earlier stages and built up trust in the system's behavior.
And the trust piece is actually a major failure mode in its own right.
Gartner's analysis of why they're predicting over forty percent project cancellation by end of next year identifies four primary failure modes. Unmanageable complexity is one — the Rube Goldberg machine problem where the interactions between agents become an unmaintainable black box. Ill-defined use cases is another — investing in agentic AI because it's the next big thing rather than because you've identified a specific messy problem it can solve. Budget black holes, which we'll come back to. But the trust gap is the one I find most insidious because it's a failure mode that looks like a success for a while and then collapses.
The agent does something unexpected, everyone panics, and suddenly you have so many human-in-the-loop checkpoints that all the efficiency gains are gone.
And you end up with an expensive system that requires constant babysitting. What's interesting is that human-in-the-loop design, done right, is actually the solution to this problem rather than a symptom of it. The Wednesday.is framework — their technical implementation guide from last July — makes the point that human-in-the-loop checkpoints aren't just a safety mechanism, they're a scope definition tool. When you define explicitly which decisions belong to the agent and which require human review, you create natural billing milestones and you prevent the runaway autonomy that erodes client trust.
Which is a completely different way to think about human oversight. Not as a limitation you're apologizing for, but as a structural feature of the contract.
Moxo's process orchestration philosophy frames it this way: separating human judgment from AI execution is the key to cost modeling. When you know which work belongs to agents and which belongs to people, you can actually forecast costs. AI handles high-volume coordination at scale. Humans handle fewer, higher-stakes decisions. That clarity makes budgeting possible in a way that pure agentic systems don't allow for. Some practitioners are even experimenting with billing by decision point — charging for each human review and approval gate rather than by time or by outcome. It aligns cost with the actual scarce resource, which is human judgment, rather than the abundant resource, which is agent execution.
Let's talk about pricing models more broadly, because there's a taxonomy here that I think a lot of consultants haven't fully worked through.
Tien Tzuo, who runs Zuora and has been thinking about subscription and usage pricing for a long time, published a breakdown of four emerging models for agentic AI services last August. The first is per-agent pricing — essentially like hiring an employee. You pay for the agent to be available whether it's working or idle. Nullify, the security vulnerability tool, charges eight hundred dollars per agent per year on this model. Works when outcomes are diffuse and long-term and you want predictable billing.
The second is per-activity, which is the metered model.
Pay when the AI does something — answers a question, writes code, runs a process. Devin charges by what they call Agent Compute Units. Microsoft Copilot has Security Compute Units. It's granular and works well for frequent, well-defined tasks where you can instrument the activity reliably.
Third is per-output — you pay for what the agent produces.
Replit charges twenty-five cents per checkpoint, which they define as a meaningful code change. Salesforce charges two dollars per customer service interaction resolved. The per-output model is appealing because it feels closer to paying for results, but the definition of "output" still requires careful contractual work. What counts as a meaningful code change? Who decides?
And then the fourth is per-outcome, which is the one everyone says is the future and almost nobody is actually doing.
The Subscribed Institute looked at around sixty agentic AI services and found that pure outcome-based pricing represents less than ten percent of them. And Tzuo's own analysis points to why: defining and tracking outcomes is enormously complicated. Zendesk charges only when a support ticket is fully resolved, which sounds clean, but apparently required massive amounts of documentation just to define what "resolved" means. Does it mean the ticket is closed? Does it mean the customer hasn't reopened it within forty-eight hours? Does it mean the customer satisfaction score was above a threshold?
And that definitional problem gets worse when you're a consulting firm rather than a product company, because you have even less control over the outcome.
This is where Luk Smeyers makes a really important contrarian argument. He's an AI consulting expert who has been pushing back hard on the "outcome-based pricing is inevitable" narrative. His position is essentially: we see no evidence that AI changes the level of control a consulting firm has over client outcomes. A client's bad data, organizational resistance, or changing business priorities can tank an agent's performance regardless of how well it was built. So if you're pricing on outcomes, you're absorbing risk that isn't yours to absorb. The more honest conversation is about risk allocation — who bears the risk when the agent underperforms, and how do you write that into a contract explicitly?
Which is a much harder conversation than "we charge for results," but probably the right one.
BCG's analysis of B2B software pricing in the agentic era lands in a similar place. Their view is that hybrid pricing — a base subscription or retainer plus variable usage or outcome components — is likely to dominate this transitional period. It balances value alignment with the practical complexity of pure outcome-based models. And there's a finding from Leanware's research that's worth sitting with: seventy-three percent of consulting clients now prefer pricing models tied to measurable business outcomes rather than time spent. But that preference doesn't mean they're ready to pay on pure outcomes. It means they want the framing to be outcome-oriented even if the billing mechanism is project-based or hybrid.
So clients want to feel like they're paying for results even if the contract is structured around milestones.
Which tells you something important about how to present the pricing conversation. You frame it around the four ROI archetypes that actually resonate with enterprise buyers. Operational excellence — reduce costs, automate repetitive cognitive work. Risk mitigation and compliance — eliminate human error in high-stakes processes. Top-line growth — accelerate revenue generation. And strategic innovation — augment human creativity, accelerate time to market. The first two are easiest to sell and measure. The third is the most compelling to a C-suite but the hardest to attribute. The fourth is highest risk, highest reward.
And for an automator — a consultant improving an existing process rather than building something new — the ROI math is actually pretty tractable.
The Stack.expert framework makes a useful distinction between "builders" and "automators." For automators, value-based pricing works well because the ROI is tangible. If you're automating a process that costs twenty hours a week at fifty dollars an hour loaded cost, that's fifty-two thousand dollars a year in labor. You price the project at fifteen to twenty-five thousand dollars, which gives the client a return on investment in six to ten months. That's a conversation that closes. Post-implementation retainers for ongoing optimization run two thousand to eight thousand a month.
The hourly rates in this space are worth noting for context, because I think a lot of people outside the field don't realize how high the ceiling is.
Junior AI consultants are running a hundred to a hundred and fifty an hour. Mid-level is a hundred and fifty to three hundred. Senior generative AI specialists are at three hundred to five hundred plus. The average salary equivalent for a generative AI specialist is around a hundred and sixty-one thousand dollars a year, with top earners above two hundred and seventy-eight thousand. The market is compensating for genuine scarcity of people who understand both the technical architecture and the business problem well enough to scope these projects correctly.
Which brings us back to McKinney's most uncomfortable prediction, which is that the median consulting shop is completely toast.
He doesn't soften it. His argument is that agentic AI commoditizes the execution layer entirely. If any reasonably capable person with access to Claude Code or Devin or whatever the current generation of coding agents is can produce working software at machine speed, then the value proposition of "we will write the code for you" collapses. What's left is the top tier of problems — those requiring genuine expert humans in the loop, the ones where design judgment and domain knowledge are irreplaceable. And his claim is that only the consultants operating at that level will survive.
I find that partially convincing and partially overstated. Because there are a lot of clients who need someone to sit across from them and say "here is what you actually need, and here is how we're going to build it carefully." That translation and trust function doesn't get commoditized just because the code generation does.
The counterargument is that the translation function does get easier with better tools, and the market for "someone who can run an agent responsibly on your behalf" is much larger than the market for "someone who can write code from scratch." But I think you're right that the trust and judgment layer — the person who can look at a client's problem and say "this is actually a three-week project, not a six-month one, and here's why" — that layer holds value. The issue is whether the margins are sufficient to sustain a firm of any scale.
There's also the total cost of ownership problem that clients consistently underestimate, which has direct implications for how consultants should frame their proposals.
The Moxo analysis from February is pretty stark on this. Integration complexity alone can increase implementation costs by thirty to fifty percent above the licensing or build cost. You have implementation and integration — connectors, data pipelines, workflow configuration, governance setup — that often equal or exceed year-one licensing. You have infrastructure and compute, which are ongoing costs outside the vendor contract. You have training and change management, because adoption doesn't happen automatically. And you have ongoing optimization, because agentic AI systems require sustained investment in monitoring, maintenance, and iteration.
And then there's the budget black hole failure mode, which is the one that gives me the most anxiety.
Consumption-based pricing on AI systems can produce overnight cost explosions that are genuinely alarming. A slight change in an agent's workflow can lead to a ten-times surge in cost with no warning. CIO magazine has documented cases of teams waking up to hundreds of thousands of dollars in unexpected charges because an algorithm ran inefficiently overnight. For consultants, this means that any proposal for an agentic system needs to include explicit cost ceiling mechanisms — hard token budgets, rate limiting, alerting thresholds — as scope items, not afterthoughts.
The Gartner warning actually functions as a sales tool if you use it correctly.
This is a point worth making explicitly. Nearly half of agentic AI projects will be scrapped by end of next year, and the primary reasons are poor scoping, unclear ROI models, and unmanageable complexity — not technical failure. For a consultant who has a rigorous discovery sprint process, a phased delivery structure, an MVA framework, and explicit HITL checkpoints built into their methodology, that Gartner number is your best marketing material. You lead with "here's why most of these projects fail" and you follow immediately with "here's exactly how our process prevents that." The clients who are sophisticated enough to understand the Gartner data are exactly the clients you want.
And the ones who aren't sophisticated enough to understand it are probably the ones who will pressure you to skip the discovery sprint because they think they already know what they want.
Which is the tell. A client who resists paying for a discovery sprint is a client who believes they can define the scope of an agentic AI project without actually investigating their own systems and data. That belief is almost always wrong, and the consultant who agrees to skip discovery to win the deal is the one who ends up absorbing the cost of the unknowns that the discovery sprint would have surfaced.
Okay, practical takeaways. If you're a consultant building agentic AI systems right now, what are the three things you actually do differently based on everything we've covered?
First: never start without a paid discovery sprint. Price it at fifteen to twenty-five percent of the anticipated project value, make it standalone-profitable, and deliver a go or no-go recommendation with genuine honesty. If the project shouldn't proceed, say so. Your reputation for accurate scoping is worth more than any single engagement.
Second: define your Minimum Viable Agent with all seven components before writing a line of code. Objective, users, trigger event, agent abilities, success metrics, dependencies, and constraints. The constraints especially — cost ceilings, accuracy floors, latency limits, scope boundaries. These are not negotiable additions. They are the contract.
Third: use human-in-the-loop checkpoints as billing milestones, not just safety mechanisms. Define explicitly which decisions belong to the agent and which require human review. Phase A completes autonomously; human sign-off triggers Phase B and the associated invoice. This creates the project rhythm that both parties can track, prevents the runaway autonomy that erodes trust, and gives you natural renegotiation points if the scope needs to change.
And for the pricing model question specifically?
Hybrid is where the market is going. A base project fee structured around the three-phase delivery model, with a variable component tied to usage or output metrics, and framing that is explicitly outcome-oriented even if the billing mechanism isn't pure outcome-based. And for longer-term engagements, a post-implementation retainer that covers the ongoing optimization work that clients systematically underestimate and that is genuinely valuable.
The deeper point underneath all of this is that the scarce resource has shifted. It used to be execution. Now it's judgment.
McKinney's line stays with me: "When generating code is free, knowing when to say no is your last defense." That's not just a quip. It's a description of what consulting value actually is in this era. The consultants who understand that their job is to exercise taste over enormous volumes of output — to hold the conceptual model of the system in their heads and be shrewd about what to build and what to leave out — those are the ones who will be doing interesting work in five years. The ones who think their job is to make the agent go faster are going to find that the agent can already do that without them.
Alright, that's a good place to land. Big thanks to our producer Hilbert Flumingtop for keeping this ship moving. And thanks to Modal for providing the GPU credits that power the show — genuinely couldn't do this without them. This has been My Weird Prompts. If you want to subscribe or find the RSS feed, head to myweirdprompts dot com. Until next time.
See you then.