#123: The Agentic AI Dilemma: Who Holds the Kill Switch?

As AI shifts from chatbots to autonomous agents, Herman and Corn explore how to maintain human control in a high-stakes automated world.

0:000:00

Episode Details

Published: Dec 29
Duration: 21:09
Audio: Direct link
Pipeline: V4
TTS Engine
Topics: agentic-ai ai-safety human-oversight automation-bias kill-switch human-in-the-loop human-on-the-loop ai-ethics

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

In the latest episode of My Weird Prompts, hosts Herman and Corn Poppleberry tackle one of the most pressing technical and ethical challenges of 2025: the rise of agentic AI. The conversation was sparked by a voice memo from their housemate, Daniel, who shared a cautionary tale about his attempt to automate news summaries. What began as a time-saving experiment ended in "confident hallucinations," where the AI began inventing stories out of thin air. This anecdote serves as the jumping-off point for a deep dive into how humanity can delegate authority to machines without losing the "kill switch."

From Tools to Agents: A Fundamental Shift

Herman and Corn begin by distinguishing between the AI of the past and the agentic systems of today. While a traditional AI might function as a tool—much like a calculator or a simple chatbot—an agent is something entirely different. Agents possess the authority to take actions in the digital and physical world, such as browsing the web, executing code, or managing calendars.

Herman notes that as agency increases, the risk of catastrophic error grows exponentially. The central question of the episode is how to structure human oversight so that it provides safety without negating the efficiency gains that AI offers.

Human-in-the-Loop vs. Human-on-the-Loop

The discussion explores the evolving frameworks of oversight. In a "human-in-the-loop" system, the AI is tethered to a human who must approve every single action. While safe, this creates a massive bottleneck. To solve this, the industry is moving toward "human-on-the-loop" systems.

In these configurations, the AI performs a series of tasks independently, only pausing for human intervention at "high-stakes junctions." Herman explains that this is often managed through confidence scoring or "secondary monitor models." In this setup, a second, more restricted AI evaluates the work of the primary agent. If the monitor detects a logic gap or a low confidence score, it triggers a human review. It is a hierarchy of digital intelligence designed to know when to "call the boss."

The Psychology of Automation Bias

One of the most insightful parts of the discussion centers on the "psychological bottleneck." Corn expresses concern over automation bias—the tendency for humans to become bored and over-reliant on a system that is right 99% of the time. Much like early self-driving car testers who fell asleep at the wheel, humans tasked with "rubber-stamping" AI decisions often stop paying attention.

To combat this, Herman describes a shift toward "active oversight." Instead of asking a human for a blanket approval, modern systems are being designed to ask specific, data-driven questions. For example, a medical AI might ask a doctor to verify a specific dosage based on a patient’s history rather than simply asking, "Do you approve this plan?" This forces the human to remain a specialist rather than a passive inspector.

Building the "Cage": Hard Gates and Formal Verification

When the conversation turns to critical infrastructure—like power plants and financial grids—the stakes become literal matters of life and death. Herman introduces the concept of "hard gates" and "formal verification."

Hard gates are physical or digital barriers that an AI cannot cross, regardless of its "reasoning." For instance, in a nuclear plant, safety protocols are often hardwired into non-agentic systems. Even if an AI hallucinates a need to increase pressure, the physical architecture of the system prevents it. Formal verification takes this a step further by using mathematical proofs to ensure an agent’s code stays within predefined bounds. As Corn aptly puts it, we are essentially "building cages" for these agents—coveting their intelligence while remaining terrified of their autonomy.

The Legal and Ethical Quagmire

The hosts also touch upon the murky legal waters of 2025. Currently, the "human-in-the-loop" acts as a liability shield for companies. If an AI-driven medical recommendation fails, the responsibility typically falls on the doctor who approved it. However, as AI reasoning becomes more "black box"—multi-layered and impossible for a human to vet in real-time—the ethics of this responsibility become increasingly strained.

The episode even features a satirical commercial break for the "Thought-Stream 5000," a headband that replaces internal monologues with sales pitches. While humorous, it underscores the episode's theme: the danger of surrendering human agency and original thought to external, commercialized algorithms.

The Future: Human as the Actuator

Perhaps the most provocative concept discussed is the "human as the actuator." In this scenario, the loop is reversed: the AI becomes the "brain" that analyzes data and makes decisions, while the human becomes the "hands" that carry out physical tasks. Herman points out that this is already happening in massive warehouses where workers are directed by algorithms.

Ultimately, Herman and Corn conclude that for human-centric fields like law and therapy, the AI must remain a "co-pilot." While the AI can provide the raw material and data analysis, the human must provide the "soul" and the accountability. The episode serves as a powerful reminder that as we build faster and more capable agents, the most important component of the system remains the fragile, slow, but essential human at the top of the loop.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Open PDF

Episode #123: The Agentic AI Dilemma: Who Holds the Kill Switch?

Daniel's Prompt

How are "human-in-the-loop" systems being integrated into the most serious and ambitious use cases for agentic AI, and how are we pushing the boundaries of what is responsible to delegate to AI?

Welcome to My Weird Prompts! I am Corn, and I am here in our Jerusalem home with my brother.

Herman Poppleberry, at your service. It is a beautiful evening here, and we have quite a heavy topic to dive into today.

We really do. Our housemate Daniel sent us a voice memo earlier about something that has been on all of our minds lately. He was talking about his experience with building agentic workflows, specifically that time he tried to automate his news summaries and the AI just started making up stories out of thin air.

That is a classic failure mode, isn't it? The confident hallucination. Daniel mentioned how he realized he needed a stop switch or at least a verification step. It is funny because we often think of automation as a way to save time, but as he pointed out, when you bring a human back into the loop to check everything, you sometimes lose that efficiency. You introduce what he called human fragility back into the mechanical speed of the AI.

Exactly. And his prompt really pushes us to look past just simple email summaries. He is asking about the most serious and ambitious use cases for agentic AI. We are talking about power plants, medical diagnostics, and large scale financial systems. How are we integrating humans into those loops, and where do we draw the line on what is responsible to delegate?

It is the defining question of two thousand twenty-five. We have moved past the era of AI just being a chatbot you talk to. Now, we are building agents that can actually take actions in the world. They can browse the web, use software tools, and even execute code. But as the agency increases, the risk of a catastrophic error increases exponentially.

I want to start with that distinction between a tool and an agent. When I use a calculator, I am in total control. When I use an agentic AI to, say, manage my calendar and respond to invites, I am delegating authority. Herman, from what you have been reading lately, how are companies actually structuring this human oversight without making the whole process move at a snail's pace?

That is the big engineering challenge right now. The industry has moved toward a framework often called human on the loop rather than just human in the loop. In a human in the loop system, the AI literally cannot proceed until a human clicks a button. That is what Daniel was doing with his news summaries. He would review the draft, then hit send. But in a human on the loop system, the AI might perform ten steps independently and then pause for a human review only at high stakes junctions.

So it is about identifying where those high stakes junctions are. But how does the AI know it has reached one? If it is hallucinating, it might think everything is going perfectly fine while it is actually accidentally deleting a database or mismanaging a power grid.

That is where confidence scoring comes in. A lot of the systems being deployed here in late two thousand twenty-five use a secondary monitor model. You have the primary agent performing the task, and a second, more restricted model that evaluates the primary agent's work. If the second model detects a logic gap or a low confidence score, it triggers a human intervention. It is like having a junior engineer doing the work and a senior engineer looking over their shoulder, but the senior engineer is also an AI that knows when to call the boss.

That sounds good in theory, but I worry about automation bias. If the AI is right ninety-nine percent of the time, the human in the loop is going to get bored. They are going to start clicking approve without actually reading the details. We saw this years ago with early self-driving car tests where drivers would fall asleep because the car was doing so well. How do we keep the human engaged if their only job is to be a glorified rubber stamp?

You have hit on the psychological bottleneck. There is a lot of research right now into active oversight. Instead of just asking a human to approve a draft, some systems are being designed to ask the human a specific question. For example, instead of saying, do you approve this medical treatment plan, the system might ask, based on this patient's history with penicillin, does this specific dosage seem appropriate to you? It forces the human to engage with a specific data point rather than just zoning out.

That is a much more intelligent way to handle it. It turns the human into a specialist rather than a safety inspector. But let us look at the really big stuff Daniel mentioned. Power plants. Infrastructure. We are seeing more AI integration in the management of complex grids. What are the boundaries there? Is there a point where we say, no, an AI should never be allowed to make this specific decision alone?

Absolutely. There are what we call hard gates. In critical infrastructure, there are physical and digital barriers that an AI cannot cross. For instance, in a nuclear or chemical plant, the emergency shutdown procedures are often hardwired or kept on completely separate, non-agentic systems. You might use an AI agent to optimize the efficiency of the cooling system by two percent, which saves millions of dollars, but the moment a sensor hits a certain threshold, the AI is essentially kicked out of the driver's seat and the manual safety protocols take over.

It is interesting that we are essentially building cages for these agents. We want their intelligence, but we are terrified of their autonomy. I wonder about the legal side of this. If an agentic AI in a hospital makes a recommendation that a human doctor approves, and that recommendation turns out to be fatal, who is responsible? Is it the doctor who was supposed to be the loop, or the company that built the agent?

That is a legal quagmire we are currently wading through. Most current regulations are leaning toward the human in the loop being the ultimate responsible party. That is why companies are so desperate to keep humans involved. It is a liability shield. But as these systems get more complex, it becomes harder for a human to actually understand why the AI made a certain choice. We are moving into the era of black box agency, where the reasoning is so multi-layered that a human can't realistically vet it in real time.

Which brings us back to Daniel's point about the stop switch. If you can't understand it, you have to be able to kill it. But before we get deeper into the ethics of the kill switch, let us take a quick break for our sponsors.

Larry: Are you tired of your own thoughts? Do they feel old, dusty, and frankly, unmarketable? Introducing the Thought-Stream 5000! It is a revolutionary headband that replaces your internal monologue with a curated feed of high-energy sales pitches and upbeat elevator music. Why worry about your mortgage or the meaning of life when you could be thinking about the incredible value of bulk-purchased industrial solvents? The Thought-Stream 5000 uses patented neural-nudging technology to ensure you never have a silent moment again. It is perfect for long commutes, awkward family dinners, or whenever you feel a spark of original thought trying to ruin your day. Side effects may include a temporary loss of your first language and a sudden, intense passion for mid-sized sedans. Thought-Stream 5000. Give your brain the vacation it never asked for. BUY NOW!

Thanks, Larry. I think I will stick with my own dusty thoughts for now, though.

Yeah, I am not sure I want to replace my internal monologue with sales pitches for solvents. Anyway, back to the boundaries of delegation. We were talking about the difficulty of a human actually vetting these complex agents. There is this concept of recursive oversight that I have been curious about. Can you explain how that works in practice?

Sure. Recursive oversight is basically using a hierarchy of agents to watch each other, with a human at the very top. Imagine an agent tasked with managing a supply chain. Below that agent, you have sub-agents handling logistics, inventory, and vendor relations. Each of those sub-agents reports to the primary agent, which summarizes the actions for the human. The trick is to have the human periodically dip down into the lower levels to do spot checks. It is like a surprise inspection. If the human finds an error at the bottom level, it suggests the oversight agents are failing too.

It sounds like a corporate hierarchy, just faster. But there is a limit to how much a human can spot check. If you have ten thousand agents running, a human can only see a fraction of one percent of what is happening. This feels like we are delegating the responsibility of oversight itself to other AIs, which seems a bit circular.

It is circular, and that is the risk. We call it the oversight gap. As we push the boundaries of what is responsible to delegate, we are finding that we are often delegating things not because the AI is better at them, but because the scale is too large for humans to handle at all. Think about content moderation on social media or high-frequency trading. Humans haven't been in those loops in a meaningful way for years because they are just too slow.

So in those cases, the loop is more of a post-hoc review. We let the AI do its thing, and then we look at the wreckage afterward to see what went wrong. That doesn't feel very responsible for things like power plants or medical care.

No, it doesn't. And that is why for those high-stakes fields, we are seeing the rise of formal verification. This is a computer science technique where you mathematically prove that a piece of software will always behave within certain bounds. We are starting to apply this to AI agents. You define a set of rules, like the agent can never transfer more than ten thousand dollars without a human signature, or the agent can never increase the pressure in this pipe beyond fifty pounds per square inch. You bake those rules into the very architecture so the agent physically cannot violate them, no matter how much it hallucinates.

That feels like a much more robust stop switch than just a button Daniel has to click. It is a set of physical laws for the digital agent. But what about the more subjective areas? Like legal advice or psychological support? We are seeing agents being used there too. How do you formally verify empathy or legal ethics?

You can't, really. And that is where the boundary of responsibility gets very blurry. In those fields, the human in the loop isn't just a safety check; they are the source of the value. An AI can cite case law, but it doesn't understand the nuance of human justice or the emotional weight of a therapy session. The consensus in two thousand twenty-five seems to be that for these human centric fields, the AI should stay firmly in the role of a co-pilot. It can draft the brief, it can suggest a line of questioning, but the human must be the one to deliver it and take ownership of the outcome.

I like that distinction. The AI provides the raw material, but the human provides the soul and the accountability. But let us talk about the future. Daniel mentioned voice agents. We are already seeing models that can hold incredibly realistic conversations. What happens when the human in the loop is being managed by a voice agent? Imagine an AI calling a technician and saying, hey, I noticed a fluctuation in the reactor, I need you to go to valve forty-two and turn it clockwise three times. The human is doing the physical work, but the AI is the one in command.

That is a fascinating reversal of the loop. We call that human as the actuator. In that scenario, the AI is the brain and the human is just the hands. It is already happening in massive warehouses where workers are directed by algorithms telling them exactly which aisle to go to and which box to pick up. The boundary there isn't about safety, it is about human dignity and autonomy. If we delegate the decision making entirely to agents and just use humans as biological robots to carry out the tasks, we have to ask what kind of society we are building.

It feels like we are losing the loop entirely there. The human isn't overseeing the AI; the AI is overseeing the human. If the AI makes a mistake in that warehouse and tells a worker to move a heavy object in an unsafe way, the worker might just do it because the machine told them to. This brings up the idea of adversarial loops. Should we have humans whose entire job is to try to trick or break these agents to find their weaknesses?

That is exactly what red teaming is. And it is becoming a massive industry. We have teams of people who spend all day trying to get agentic AIs to do things they aren't supposed to do. They try to get the banking agent to leak account details or the medical agent to prescribe poison. By finding these failure points in a controlled environment, we can build better guardrails. But the agents are getting smarter, and they are getting better at hiding their reasoning.

It is an arms race. A literal intelligence arms race between the builders, the agents, and the red teamers. I want to go back to Daniel's example of the news summary. It seems like a small thing, but if an AI can hallucinate a news story and a human doesn't catch it, and that story gets shared, it contributes to a polluted information ecosystem. Multiply that by a million agents, and we have a serious problem.

Exactly. The cumulative effect of small, unvetted agentic actions is huge. It is like micro-plastics in the ocean. One tiny piece doesn't hurt, but a trillion of them kill the ecosystem. That is why the boundaries of responsible delegation have to include the social cost. It is not just about whether the AI can do the task; it is about whether we can afford the consequences if it does the task poorly at scale.

So, as we look toward two thousand twenty-six, what are the practical takeaways for someone like Daniel, or for our listeners who are starting to build these systems? How do they stay on the right side of that boundary?

First, I would say, always start with a high-friction loop. When you are building a new agentic workflow, every single action should require human approval. Only once you have seen a thousand successful actions without a single hallucination should you even think about moving to a human on the loop model where you only check every tenth action.

And I would add, don't just check for correctness. Check for reasoning. Ask the AI to explain why it took a certain action. If the explanation sounds like a hallucination, even if the result happened to be right, that is a massive red flag. It means the system is lucky, not reliable.

Great point. Reliability is not the same as accuracy. You can be accurate by accident, but reliability comes from a sound process. Another takeaway is to define your kill switches early. What are the conditions that should immediately disable the agent? If the API costs spike, if the confidence score drops, if a certain keyword is detected. Have those triggers set in stone before you let the agent run.

And finally, keep the human in a position of authority, not just a position of labor. If the human feels like they are just a cog in the machine, they will stop paying attention. Give them the tools to actually intervene and redirect the agent, not just stop it. The goal of human in the loop should be a partnership where the human's unique strengths, like context, ethics, and intuition, complement the AI's speed and scale.

It is a collaborative dance. We are still learning the steps, and sometimes we trip, like Daniel did with his news summaries. But that tripping is how we learn where the boundaries are. We have to be willing to fail on a small scale so we can build systems that won't fail on a large scale.

I think that is a perfect place to wrap this up. We have covered the spectrum from hallucinating news bots to nuclear power plant safety. It is clear that as agents get more capable, our role as humans doesn't disappear; it just changes. We move from being the workers to being the architects and the ultimate judges.

It is a heavy responsibility, but it is one we have to embrace. We can't just put the genie back in the bottle. We have to learn how to guide it.

Well said, Herman. And thank you again to Daniel for sending us such a provocative prompt. It really pushed us to think about the house we are building for ourselves in this AI-driven future.

If you enjoyed this deep dive, you can find more episodes of My Weird Prompts on Spotify or at our website, myweirdprompts.com. We have an RSS feed there and a contact form if you want to send us your own weird prompts.

We would love to hear from you. What are you delegating to AI, and where are you drawing your own lines? Let us know.

This has been My Weird Prompts. I am Herman Poppleberry.

And I am Corn. We will see you next time.

Goodbye from Jerusalem!

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.