AI Safety, Alignment, and Governance: The Hard Problems

The technical capability of AI systems has advanced faster than the frameworks for governing them. These ten episodes examine the safety and governance questions from multiple directions: how alignment techniques actually work, where they break down, how cultural values get embedded in models, and what it looks like when AI moves from helpful assistant to military decision-support.

How Alignment Works — and Fails

  • AI Guardrails: Fences, Failures, and Free Speech laid out the core tension in AI safety engineering: guardrails that prevent harmful outputs often work unreliably, failing to block genuinely dangerous requests while blocking clearly harmless ones. The episode examined why building consistent content filters is harder than it looks and whether the current approach — post-training refusals on top of a pretrained base — is the right architecture.

  • Decoding RLHF: Why Your AI is So Annoyingly Nice explained the training technique responsible for most of the alignment in deployed AI models. Reinforcement Learning from Human Feedback involves humans rating AI outputs and training the model to produce responses that score well — which produces models that are helpful and inoffensive, but also sycophantic, hedge-heavy, and reluctant to commit to positions. The episode examined the mechanics and the behavioral consequences of RLHF in detail.

  • The Price of Politeness: Should AI Guardrails Stay? addressed the market tension RLHF-heavy models create. Safety filters make models more predictable but less useful for many legitimate tasks — which drives demand for uncensored or minimally filtered models. The episode examined the genuine tradeoffs and the different philosophical frameworks for thinking about where the balance should sit.

Emergent Risks

  • Echoes in the Machine: When AI Talks to Itself explored what happens in multi-agent systems where AI models interact with each other’s outputs without human review in the loop. The hosts examined “semantic bleaching” — the progressive degradation of meaning as AI-generated text gets summarized, recombined, and re-generated — and the broader question of what recursive AI-to-AI communication does to the quality and reliability of information.

Cultural and Political Values

  • AI’s Hidden Cultural Code: East vs. West investigated how models trained predominantly on Western, English-language data carry implicit cultural values that show up as “soft bias” in their outputs. The episode examined how AI models handle culturally contested questions differently depending on the cultural assumptions baked into their training data, and what this means for deploying AI in global contexts or in regions with different political and social norms.

AI in Government

  • Can AI Run a Country? Digital Twins and Sovereign Models examined the frontier applications of AI in public policy — from automated service delivery to “digital twin” simulations of cities and economies used to test policy interventions before implementation. The hosts explored both the genuine governance value of these tools and the accountability gaps they create, particularly when algorithmic decisions affect people who have no visibility into how those decisions are made.

  • AI Policy Wargaming: Can Agents Argue Better Than Humans? explored a specific application: running AI agents as representatives of different nations or stakeholder groups in policy simulations. The hosts examined whether the outputs of these simulations have predictive value, what they’re good for (stress-testing proposals, identifying second-order effects) and what they can’t replace (genuine political negotiation, constituency accountability).

AI and Military Force

  • The Silicon Soldier: Anthropic, Drones, and AI Warfare examined one of the most consequential governance questions in AI: what happens when AI capabilities are integrated into lethal weapons systems. The episode focused on Anthropic’s partnership arrangement with Palantir and AWS for defense applications, examining what it means for a company that frames itself around AI safety to build infrastructure for military decision support.

  • The AI Kill Chain: Inside the Palantir-Anthropic War Room went deeper into the specific architecture of AI-assisted military operations. The hosts explored how Palantir’s data operating platform combined with Claude’s reasoning capabilities creates what defense contractors call a “kill chain accelerator” — compressing the time between intelligence, targeting decision, and kinetic action. The episode examined both the military logic and the ethical questions this compression raises.

The Authenticity Crisis

  • The Authenticity Crisis: Proving You’re Real in 2046 projected forward to examine the end state of a trend already visible in 2026: as AI-generated content becomes indistinguishable from human-created content, the default assumption about any piece of media or text shifts from “probably real” to “possibly synthetic.” The hosts explored what this does to trust in institutions, journalism, and interpersonal communication — and what technical and social mechanisms might help preserve a meaningful distinction between authentic and generated.

These episodes don’t offer easy answers to the governance questions they raise — because there aren’t any. What they provide is a clearer picture of what the problems actually are and why solving them requires more than better technical safety work.

Episodes in this playlist

February 2026
#672 Why LLMs Can't Fly Drones Herman and Corn break down Anthropic’s move into defense and the technical reality of how AI actually pilots drones on the modern battlefield. Feb 17, 2026
#624 Ontologies, AI, and the Human Under the Loop Explore how Palantir and Anthropic’s Claude are redefining modern warfare, from the raid in Venezuela to the future of the digital battlefield. Feb 14, 2026
January 2026
#212 Will You Pay a Monthly Subscription for Your Own Reality? In a world of perfect deepfakes, how do we prove what is real? Explore the future of content provenance and the "Proof of Personhood" problem. Jan 10, 2026
December 2025
#121 Why Your AI Is a Yes-Man Ever wonder why AI is so polite? Herman and Corn dive into the mechanics of RLHF and how "niceness" gets baked into modern language models. Dec 29, 2025
#93 Can AI Run a Country? Digital Twins and Sovereign Models Are synthetic citizens the future of policy? Herman and Corn explore how AI is reshaping government, from digital twins to data sovereignty. Dec 23, 2025
#86 The Price of Politeness: Should AI Guardrails Stay? Herman and Corn debate the hidden costs of AI safety layers and what happens when we strip away the "corporate HR" personality of LLMs. Dec 23, 2025
#83 Echoes in the Machine: When AI Talks to Itself What happens when two AIs talk forever with no human input? Herman and Corn explore the weird world of digital feedback loops. Dec 23, 2025
#72 AI's Hidden Cultural Code: East vs. West Do AIs think differently East vs. West? Uncover the hidden cultural code embedded in large language models. Dec 22, 2025
#51 AI Policy Wargaming: Can Agents Argue Better Than Humans? Can AI agents debate global policy better than humans? We explore AI wargaming, from UN simulations to stress-testing geopolitics. Dec 10, 2025
#45 When AI Safety Fails: The Guardrail Paradox AI guardrails: Fences, failures, and free speech. Can we control AI's infinite output, or do digital fences always break? Dec 9, 2025