#2540: Does Your AI Framework Change the Output?

Same model, same prompts, different harness. Does the plumbing change the water?

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2698
Published: Apr 29
Duration: 30:04
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents prompt-engineering agent-framework-comparison

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Does Your AI Framework Actually Change the Output?

A developer building a daily security report generator for Israel asked a deceptively simple question: if I use the same model, the same tools, and the same prompts, but switch the framework — LangGraph, Deep Agents, or Pydantic — will the morning summary actually read differently?

The short answer is yes. Sometimes dramatically so. And the differences reveal something most benchmarks miss entirely.

The Hidden Variable: The Harness

Benchmarks measure the model. They test raw capability — reasoning, knowledge, instruction-following. But nobody measures the harness. Nobody asks whether LangGraph versus Pydantic versus Deep Agents produces meaningfully different outputs given identical inputs.

The answer is that the harness shapes the output in ways that are structural, not cosmetic. These frameworks are deeply opinionated, and those opinions change what the user actually reads.

Deep Agents: The Open-Ended Researcher

Deep Agents takes a maximalist approach to autonomy. Give it tools, give it a goal, and let it figure out the path. It can query sources, ask clarifying questions, and recursively dig deeper. This is structurally built into the architecture — the agent decides for itself when it has enough information.

For a security report, this is a problem. The information space is unbounded. There's always another source, another angle, another detail. The agent can't naturally reach a stopping point because the framework is designed to never be satisfied. You might get an incredibly deep analysis of one specific aspect — say, movements in the north — but never reach the other sections. The framework structurally amplifies noise rather than filtering it.

LangGraph: Domain Expertise Embedded in Structure

LangGraph takes a different approach. You define a graph: nodes, edges, starting points, termination points. The structure imposes a stopping condition. The agent follows the graph — it doesn't decide to keep going.

This allows for conditional routing. If the filtering agent finds a high-priority event, the graph can route through an emergency path with different formatting, tone, and length. The developer's judgment — years of experience living in the region, understanding what's genuinely concerning versus routine — gets embedded in the graph's decision points. The model handles what it's good at (processing text, extracting information, generating summaries) while the developer's expertise constrains the autonomy at the points where judgment matters most.

Pydantic: Schema as Constraint

Pydantic is fundamentally different. It's a data validation library, not an agent framework. When used for agentic workflows, it defines the shape of the output — the schema that each step must conform to. The thinking shifts from "what happens first" to "what must the output look like."

For a security report, this means every section always appears, every field always gets filled. Even if nothing significant happened, the schema forces a "no significant developments" entry. The output is consistent and predictable, but it can't dynamically adjust to what's actually happening. The schema is the boss.

Why This Matters

A brilliant model inside a poorly designed harness produces worse results than a mediocre model inside a well-designed harness for many real-world tasks. The variance in output quality attributable to the framework can exceed the variance between different models.

There's no prescriptive playbook for agentic AI yet. That's both exciting — room for genuine creativity — and terrifying — no reliable best practices. But one insight is clear: autonomy is only valuable if the agent has good judgment. Current models don't have domain-specific judgment in meaningful ways. The smart move is to constrain autonomy where judgment matters most, and let the model do what it's actually good at.

The plumbing changes the water. The question is whether you're designing the pipes or letting the water find its own path.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2540: Does Your AI Framework Change the Output?

Daniel sent us this one, and it's a good one. He's been building a situational report generator for the security picture in Israel — basically a multi-agent pipeline that pulls together a daily news summary, shaped by what he actually wants to know and what he wants to tune out. His motivation is something a lot of us feel: the news is full of speculation, light on concrete information, and when you're living in the middle of it, that's genuinely frustrating. But the question he's really asking is bigger than his pipeline. He's asking whether the framework you choose — LangGraph, Pydantic, Deep Agents — actually changes the output. Same model, same tools, same prompts, different harness. Does the plumbing change the water?

That question gets right at something most people miss about agentic AI right now. Before we dive in — quick note, today's script is coming to us from DeepSeek V four Pro.

I've been impressed with what it can do.

Yeah, it's been strong. But back to Daniel's question. What I love about this is he's identified something that the benchmarks completely miss. Benchmarks measure the model. They measure the raw capability. But nobody's measuring the harness. Nobody's measuring whether LangGraph versus Pydantic versus something else produces a meaningfully different output given identical inputs. And I think the answer is yes, it does — sometimes in ways that are subtle, sometimes in ways that are dramatic. But the differences aren't where most people would look for them.

Alright, let's get concrete. Daniel's pipeline right now is LangGraph-based. He's asking: if he'd built the same thing with Deep Agents or Pydantic, would his morning security summary actually read differently?

Let me start with Deep Agents, because Daniel mentioned he looked at it and backed away — which I think was smart. Deep Agents takes what I'd call the maximalist approach to autonomy. The philosophy is basically: give the agent the tools, give it the goal, and let it figure out the path. It can query sources, it can ask clarifying questions, it can recursively dig deeper. Daniel described it perfectly — it's like an eager student who just keeps saying "tell me more, tell me more, tell me more." And that's not a bug in the implementation, that's a structural property of the architecture.

Why is the loop baked in?

Because Deep Agents is built on the idea that the agent should be able to determine for itself when it has enough information. There's no hard stop, no pre-defined graph of nodes. The agent is given a research objective and it decides: do I need more? Do I need to look at another source? Do I need to clarify what I'm looking for? The problem is that for something like a security situational report, there's always more. There's always another source, another angle, another detail. The agent can't naturally reach a point where it says "I'm done" because the information space is unbounded.

The framework itself — the fact that Deep Agents is open-ended by design — would produce a fundamentally different output. Not because the model is different, but because the stopping condition is different.

With LangGraph, Daniel's defining a graph. He's saying: here are the nodes, here are the edges, here's where the flow starts, here's where it ends. The structure imposes a stopping condition. The agent doesn't get to decide to keep going — it follows the graph. So the output is shaped by the fact that the pipeline terminates at a defined point. With Deep Agents, you might get a much deeper report on one particular aspect of the security situation — say, a detailed breakdown of movements in the north — but you might never get to the other sections because the agent got fascinated by one branch and couldn't let go.

Daniel's whole goal is to tune out the noise. If the framework is structurally inclined to chase every thread, you've actually built the opposite of what you wanted. You've built a noise amplifier.

That's the irony. He wanted to escape speculative news, and Deep Agents would have given him the most speculative possible output — not because the model is speculating, but because the architecture is designed to never be satisfied with what it has. It's structurally incapable of saying "that's enough.

Let's talk about Pydantic then. Daniel mentioned it as the more code-based approach — thinking in terms of variable flow rather than graph flow. How does that change things?

Pydantic is interesting because it's not really an agent framework in the way LangGraph is. Pydantic is a data validation library. When people say they're using Pydantic for agentic workflows, what they usually mean is they're using it to define structured outputs — the shape of the data that comes out of each step. And that's a fundamentally different way of thinking about the pipeline. With LangGraph, you're thinking about the flow — what happens first, what branches where, what merges back. With a Pydantic-heavy approach, you're thinking about the types. You're defining: at this step, the output must conform to this schema. It must have these fields, these types, these constraints.

The harness is shaping the output by constraining what can be said, not by constraining the path to get there.

And that has real consequences for something like a security report. If Daniel defines a Pydantic model for his summary output — say, it must have a section on rocket alerts, a section on diplomatic developments, a section on military movements, each with specific sub-fields — then the model is forced to populate those fields. It can't decide that today rocket alerts aren't interesting and skip them. The schema is the boss. With LangGraph, he could build a graph that dynamically routes based on what's happening — if there are no alerts today, the graph might skip that node entirely. The output would be shorter, more focused on what's actually happening.

That's a really concrete difference. Same model, same sources, same day's news — but the Pydantic approach gives you a report that always has the same sections, always fills every field, even if some of those fields say "no significant developments." The LangGraph approach might give you a report that's three paragraphs on a specific incident and nothing else, because the graph decided that's all that mattered.

Here's where it gets even more interesting. Daniel asked whether these are "opinionated" frameworks, and I think that's exactly the right word. LangGraph is opinionated about process — it thinks the right way to build an agent is to map out the flow as a graph, with explicit nodes and edges. Pydantic is opinionated about output — it thinks the right way to build an agent is to strictly define what comes out at each step. Deep Agents is opinionated about autonomy — it thinks the right way is to let the agent figure out the process and the output.

None of these opinions are about the model. They're all upstream of the model. And yet they change what the user actually reads.

I want to dig into something Daniel said that I think is really important. He said he was impressed with DeepSeek V four, and he noted that the most impressive uses of AI are often created by people who think about the plumbing in interesting ways — not necessarily the highest benchmark or the most GPUs. There's a paper that's been circulating in agent-building circles, and it makes exactly this point. The researchers took the same underlying model and ran it through different agentic frameworks on the same tasks. The variance in output quality attributable to the framework was substantial — in some cases larger than the variance between different models.

The harness matters more than the model?

In specific cases, yes. But think about why. The model is producing tokens. The harness is deciding what context those tokens see, what tools they can call, how the outputs of one step feed into the next, when to stop, when to branch, when to merge. Those are all decisions that shape what the model actually does. A brilliant model inside a poorly designed harness will produce worse results than a mediocre model inside a well-designed harness for a lot of real-world tasks.

That's a humbling thought for the people spending billions on training bigger models.

It should be. And it connects to something Daniel touched on — the idea that there's no prescriptive or defined way of doing anything correctly in agentic AI right now. That's both exciting and terrifying. It's exciting because it means there's room for genuine creativity. Someone with a clever idea about how to structure a graph or how to chain prompts can produce something that outperforms a team with ten times the resources. It's terrifying because it means there's no reliable playbook. You can't just follow best practices — there aren't any yet.

Daniel's security report pipeline is a perfect test case for this. Let's walk through it concretely. He's got a multi-agent setup. What does that actually mean in a LangGraph implementation versus the alternatives?

In LangGraph, multi-agent means multiple nodes in the graph, where each node might be a different agent with a different system prompt and different tools. One agent might be responsible for gathering sources — it queries news APIs, checks official channels, pulls in reports. Another agent might be responsible for filtering — it takes the raw sources and decides what's relevant based on Daniel's criteria. A third agent might handle summarization — turning the filtered information into a readable report. The graph defines how these agents hand off to each other.

The graph can have conditional edges, right? If the filtering agent finds something high-priority, it routes to a different summarization agent than if everything's routine.

That's where the graph approach really shines. You can build in domain logic. If there's an ongoing rocket attack, the graph routes through an emergency path that produces a different kind of output — maybe shorter, maybe with different formatting, maybe with a different tone — than the routine daily summary. That's hard to do with a purely schema-driven approach, because the schema doesn't know about emergencies. It just knows about fields.

The LangGraph approach lets Daniel bake in his judgment about what matters. He's not just defining the output format — he's defining the decision points.

That's the crucial difference. With LangGraph, Daniel's judgment is embedded in the structure of the graph. The graph says: check for these conditions, and if they're true, do this. With Deep Agents, the agent is supposed to exercise judgment on its own — but it doesn't have Daniel's years of experience living in Jerusalem, his understanding of what's actually concerning versus what's routine, his sense of which sources are reliable and which are alarmist. The agent can't replicate that judgment because it's not in the training data. It's in Daniel's head.

The LangGraph approach is actually a way of encoding domain expertise that the model itself doesn't have.

And this gets at something deeper about agentic AI that I think is underappreciated. We talk a lot about giving agents autonomy, but autonomy is only valuable if the agent has good judgment. If it doesn't — and current models don't, not really, not in domain-specific ways — then giving it more autonomy just means giving it more opportunities to be wrong. The smart move is to constrain the autonomy at the points where judgment matters most, and let the model do what it's actually good at: processing text, extracting information, generating fluent summaries.

That's a strong argument for the graph-based approach in general, and it explains why Daniel gravitated toward it naturally. He said it feels logical to think about agentic workflows in terms of graphs — forking and branching. His intuition matches the architecture.

It's not just intuition. There's a reason LangGraph has become one of the dominant frameworks for building agentic systems. LangChain, the company behind it, reported that LangGraph has seen massive adoption because it gives developers explicit control over the flow of information. You're not hoping the agent makes good decisions — you're building the decision points yourself.

Let's push on this. If LangGraph is so great, why would anyone use Pydantic instead? What's the counter-argument?

The counter-argument is reliability and type safety. When you define a Pydantic model for your outputs, you get guarantees. You know that the output will have certain fields, that those fields will be of certain types, that they'll conform to certain constraints. If you're building a pipeline that feeds into another system — say, Daniel's report gets ingested by a dashboard or a database — those guarantees matter. With a purely graph-based approach, the output can be more variable. One day it might be three paragraphs. Another day it might be ten. If your downstream system expects a consistent format, that variability is a problem.

Pydantic gives you consistency at the cost of flexibility. LangGraph gives you flexibility at the cost of consistency.

That's the trade-off in a nutshell. And which one matters more depends entirely on what you're building. For Daniel's use case — a personal daily summary that he reads himself — flexibility probably matters more. He doesn't need every report to have the same sections. He needs the report to highlight what actually matters today. But if he were building this for a newsroom, where the output feeds into a CMS and needs to have a consistent structure, Pydantic might be the better choice.

Let's talk about the failure modes, because Daniel mentioned one explicitly — the recursive loop problem with Deep Agents. What are the failure modes for LangGraph and Pydantic?

For LangGraph, the main failure mode is that your graph is wrong. You built a graph that doesn't capture the right decision logic. Maybe you set a threshold for "high priority" that's too sensitive, and every report gets routed through the emergency path. Maybe you forgot to handle a case, and the graph hits a node it can't process. The agent doesn't know what to do because you didn't tell it. That's different from Deep Agents, where the failure mode is that the agent does too much. With LangGraph, the failure mode is often that the agent does too little — it follows the graph faithfully but the graph is incomplete.

The failure mode for Pydantic is that the schema forces the model to say something even when there's nothing to say. You get hallucination by structural requirement. The schema says there must be a field for "diplomatic developments," but nothing diplomatic happened today — so the model fills it with something vague or speculative, because it has to put something there. That's exactly the kind of speculation Daniel was trying to escape.

That's a really important point. The framework can actually induce the very problem you're trying to solve.

It's not a theoretical concern. I've seen this in practice. People build these elaborate Pydantic schemas for their agent outputs, and they get back reports where every field is populated — but some of those fields are just the model inventing things to satisfy the schema. It's not malicious. The model is just doing what it was asked to do. But the constraint created the hallucination.

Daniel's choice of LangGraph is looking smarter by the minute. He avoids the Deep Agents infinite loop problem because his graph has terminal nodes. He avoids the Pydantic hallucination problem because his graph can skip sections when there's nothing to report.

There's a cost. The cost is that Daniel has to maintain the graph. Every time the security situation changes in a way he didn't anticipate, he has to update the graph. New type of threat? New source of information? The graph is a living thing, and it requires ongoing attention. With Deep Agents, you just set the goal and let it run. With Pydantic, you define the schema once and it's relatively stable. With LangGraph, you're signing up for active maintenance.

That's the trade-off of encoding judgment. Judgment changes, so the encoding has to change too.

This is where I think the term "opinionated framework" really earns its weight. When Daniel chooses LangGraph, he's not just choosing a technical tool. He's choosing a philosophy about where the judgment lives. In LangGraph, the judgment lives in the graph — it's explicit, it's maintained by the developer, it's auditable. In Deep Agents, the judgment lives in the model — it's implicit, it emerges from the training data, it's hard to audit. In Pydantic, the judgment lives in the schema — it's structural, it's enforced by the type system, it's rigid.

Three different answers to the question: who decides what matters?

And I think for Daniel's use case — a personal security report where his own judgment about what matters is the whole point — the graph approach is clearly the right one. He's not trying to build a general-purpose news summarizer. He's trying to build a pipeline that reflects his specific priorities, his specific concerns, his specific knowledge of the region. The graph is how he encodes all of that.

Let's zoom out for a second. Daniel's question touches on something bigger about the state of AI right now. He said he loves that it's a creative space, that technical and creative don't usually go hand in hand. Is that actually true about agentic AI, or is that just the honeymoon phase?

I think it's true, and I think it will remain true for a while. The reason is that agentic AI is fundamentally a design problem, not a pure engineering problem. When you're building an agent pipeline, you're making decisions about flow, about constraints, about where to put the human judgment and where to let the model roam free. Those are design decisions. They have technical implications, but they're not purely technical. They're about what you value, what you're optimizing for, what kind of output you want.

The space of possible designs is enormous. That's what creates the room for creativity.

If there were one obviously correct way to build an agent pipeline, there would be no creativity. You'd just follow the playbook. But there isn't. There are trade-offs everywhere. Speed versus thoroughness. Flexibility versus consistency. Autonomy versus control. Every pipeline is a set of answers to those trade-offs, and the answers depend on the specific thing you're building and who you're building it for.

Daniel also mentioned something about people objecting to the term "architecture" in technology. I think I know what he means — there's a contingent that thinks architecture is a word for buildings, and using it for software is pretentious.

I disagree with those people, and I think agentic AI is actually the strongest argument for using the term architecture. When you design a building, you're making decisions about flow — how people move through the space, where they enter, where they exit, what they see first. When you design an agent pipeline, you're making exactly the same kinds of decisions about information flow. The graph is the floorplan. The nodes are the rooms. The edges are the hallways. It's architecture in the most literal sense.

Just like architecture, the design shapes the experience in ways that aren't always obvious from the blueprint. Two buildings with the same materials can feel completely different because of how they're laid out. Two agent pipelines with the same model and the same prompts can produce completely different outputs because of how they're structured.

That's the point Daniel is making, and it's a point that deserves more attention than it gets. The AI industry is obsessed with models. Every week there's a new model, a new benchmark, a new claim about reasoning or coding or whatever. But for people actually building things — people like Daniel — the model is just one component. The architecture is what makes it useful.

Let's get practical. If someone's listening to this and they're thinking about building their own agent pipeline, what should they be asking themselves when they choose a framework?

First: where does the judgment live? If the judgment needs to be explicit and auditable, you want a graph-based approach. If the judgment can be emergent, you might be fine with something more autonomous. Second: how much does the output shape matter? If you need consistent, machine-readable output, you want strong typing — Pydantic or something like it. If the output is for humans and can vary, you can be looser. Third: how much are you willing to maintain? Graphs require maintenance. Schemas require maintenance. Autonomous agents require less maintenance but give you less control.

For Daniel's specific case?

He made the right call. A security situational report is exactly the kind of thing where judgment matters, where the output is for a human, and where the person building it has domain expertise that the model lacks. LangGraph lets him encode that expertise directly into the pipeline. He's not just using AI — he's teaching the pipeline to think the way he thinks about security information. That's the whole game.

There's a flip side to this, though. Daniel's pipeline reflects his judgment. That means it also reflects his blind spots. The graph can only route based on conditions he thought to include.

That's the limitation of any opinionated framework. The opinions are only as good as the person who baked them in. If Daniel has a blind spot — say, he underestimates the importance of a particular type of signal, or he overweights a particular source — the graph will faithfully reproduce that blind spot every day. An autonomous agent might accidentally surface something he would have missed. The graph won't.

There's a case for hybrid approaches. Use the graph for the structure, but leave some nodes where the agent has more freedom to explore.

That's actually what a lot of the best pipelines do. The graph handles the high-level flow — the things you're confident about. But within a node, you might give the agent more latitude. You might say: here's a set of sources, here's the general topic, summarize what's important. You're not constraining the output format, you're just constraining the scope. That gives you the best of both worlds — control where you need it, flexibility where you can afford it.

Let's circle back to something Daniel said at the beginning. He's building this because he finds the news frustrating — full of speculation, light on concrete information. And his goal is to tune out the news with confidence. Does an agent pipeline actually solve that problem?

It can, but only if the pipeline is designed to be anti-speculative. And that's not a property of the model — it's a property of the instructions, the sources, and the structure. If you feed the pipeline sources that are themselves speculative, and you don't have a filtering step that screens for concreteness, you're just automating the ingestion of speculation. The pipeline has to actively resist the thing Daniel is trying to escape.

That resistance has to be designed in. It won't emerge on its own.

You need a node in the graph that specifically evaluates sources for concreteness. You need prompts that tell the model: prefer official statements over analyst commentary, prefer confirmed facts over projections, prefer specifics over generalities. You need to define what "concrete information" means in the context of security reporting, and then enforce that definition structurally.

That's a whole separate design problem. And it's one where the framework choice matters less than the prompt engineering and the source selection.

Which brings us back to the core point. The framework matters, but it's not the only thing that matters. Daniel's question was: if everything else is equal — same model, same tools, same prompts — how different would the output be with a different framework? And the answer is: meaningfully different, but not in every dimension. The framework shapes the structure of the output, the consistency, the stopping behavior, the handling of edge cases. But it doesn't shape the fundamental quality of the information — that comes from the sources and the prompts.

The framework is necessary but not sufficient.

A great framework with bad prompts will produce bad output. A bad framework with great prompts will also produce bad output, but in different ways. You need both.

I want to push on one more thing. Daniel mentioned that he's increasingly impressed by people who find more efficient ways to train and better ways to harness parts together. There's a whole movement right now around making agentic AI more efficient — fewer tokens, cheaper runs, faster outputs. Does the framework choice affect efficiency?

And this is where Deep Agents really falls down for a lot of use cases. The recursive loops Daniel mentioned aren't just annoying — they're expensive. Every "tell me more" is another API call, another round of token generation. I've seen reports of people burning through hundreds of dollars in a single Deep Agents session because the agent couldn't decide it was done. LangGraph is inherently more efficient because the graph has a defined endpoint. It's going to run a predictable number of steps.

Pydantic itself doesn't really affect efficiency — it's just defining the shape of the output. But the development style it encourages can be more efficient. If you're thinking in terms of typed outputs, you tend to build pipelines with fewer steps, because each step is doing more structured work. You're not chaining ten agents together — you might have two or three, each producing a well-defined structured output.

There's a development efficiency and a runtime efficiency, and they don't always point in the same direction.

LangGraph gives you runtime efficiency — predictable costs, predictable duration. Pydantic might give you development efficiency — faster to build, easier to reason about. Deep Agents gives you neither — it's expensive to run and hard to debug. But it gives you something else: the possibility of discovering things you didn't know to look for.

For Daniel's use case, that possibility isn't worth the cost. He knows what he's looking for. He's not exploring — he's monitoring.

And that distinction — exploring versus monitoring — is another way to think about framework choice. If you're exploring an open-ended question, you might want an autonomous agent that can chase threads. If you're monitoring a known set of concerns, you want a structured pipeline that reliably covers those concerns and stops.

Alright, let's land this. If someone's building an agentic pipeline and they're staring at the framework choices, what's the one thing you'd tell them to think about?

I'd tell them to think about what they're optimizing for, and be honest about it. Are you optimizing for thoroughness? Are you optimizing for consistency? Go with strong typing. Are you optimizing for control and auditability? Go with graphs. There's no universally correct answer. The correct answer depends on what you're building and who it's for.

If you're Daniel, building a personal security report where your own judgment is the secret sauce, you go with the graph.

You go with the graph. And you accept that you're signing up for maintenance, because your judgment will evolve, and the graph will need to evolve with it.

One last thing. Daniel said he loves that AI is a creative space, that there's often no prescriptive way of doing things correctly. Do you think that lasts? Or does the field eventually converge on best practices the way every other engineering discipline has?

I think it converges, but I think it converges slowly, and I think the convergence will be around patterns rather than specific frameworks. People will converge on the idea that you should separate the flow control from the output formatting. They'll converge on the idea that you should have explicit stopping conditions. They'll converge on the idea that domain expertise should be encoded in the structure, not just in the prompts. But those patterns can be implemented in LangGraph, or Pydantic, or whatever comes next. The frameworks will change. The principles will stick.

Learn the principles, not just the tools.

The tools are temporary. The principles — where does judgment live, how do you handle edge cases, how do you balance control and autonomy — those are permanent. Daniel's pipeline will outlive LangGraph. If he's built it on solid principles, he'll be able to rebuild it in whatever framework dominates five years from now.

And now: Hilbert's daily fun fact.

Hilbert: The collective noun for a group of porcupines is a prickle.

actually kind of perfect.

Here's the thing I keep thinking about after this conversation. Daniel built something because he was frustrated with the news. He wanted information he could trust, shaped by his own judgment about what matters. And the tool he used — LangGraph, an agentic framework — let him do that. But the deeper point is that he didn't need a better model. He needed a better way to use the models we already have. That's the story of agentic AI right now. The frontier isn't in the models. It's in the architecture around them.

That's going to be true for a while. The models will keep improving, but the gap between what a model can do in theory and what it does in practice is determined by the harness. The people who build the best harnesses will get the best results, regardless of which model they're using.

Thanks as always to our producer Hilbert Flumingtop for keeping this show running. This has been My Weird Prompts. Find us at myweirdprompts dot com, and if you're building something interesting with agentic AI, we'd love to hear about it.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2540: Does Your AI Framework Change the Output?

Does Your AI Framework Actually Change the Output?

The Hidden Variable: The Harness

Deep Agents: The Open-Ended Researcher

LangGraph: Domain Expertise Embedded in Structure

Pydantic: Schema as Constraint

Why This Matters

Downloads

You Might Also Like

#2540: Does Your AI Framework Change the Output?