#ai-safety
49 episodes · Page 2 of 3
#2021: Your Frozen AI Is Getting Smarter (Here's How)
Your AI model might be static, but the system around it can make it learn in real-time.
#2015: The Think Tanks Writing AI's Rulebook
As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.
#2009: The Plumbing of AI Safety: Guardrails, Not Vibes
We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.
#2006: How Do You Measure an LLM's "Soul"?
Traditional benchmarks can't measure tone or empathy. Here's how to evaluate if an AI model truly "gets it right."
#1994: Why Can't AI Admit When It's Guessing?
Enterprise AI now auto-filters low-confidence claims, but do these self-reported scores actually mean anything?
#1985: AI Tutors vs. Human Error: Who Do You Trust?
AI gets flak for hallucinations, but humans misremember 40% of facts. Why the double standard?
#1957: Why AI Agents Think in Circles, Not Lines
Linear AI pipelines are brittle. Learn why loops, reflection, and state management are the new standard for reliable, autonomous agents.
#1932: How Do You QA a Probabilistic System?
LLMs break traditional testing. Here’s the 3-pillar toolkit teams use to catch hallucinations and garbage outputs at scale.
#1837: The Human-in-the-Loop Price Tag: What Safety Costs in 2026
From $0.50 reviews to $500 platforms, we break down the real cost of keeping humans in charge of AI agents.
#1819: Claude's 55-Day Personality Transplant
Anthropic leaked 55 days of system prompt updates. See exactly how they rewired Claude's personality, safety rules, and self-awareness.
#1786: When AI Supervisors Fire AI Workers
A new "Agent-in-the-Loop" framework lets AI models manage and terminate other AI agents in real-time.
#1762: Testing AI Truthfulness: Beyond Vibes
Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.
#1738: Hyperstition Engines: When AI Writes Reality
LLMs aren't just predicting the future; they're generating the narratives that force it into existence.
#1733: When AI Agents Build Their Own Societies
AI agents are forming neighborhoods, economies, and hospitals in server-side simulations that mirror real human behavior.
#1561: Abliteration: The High-Dimensional Lobotomy of AI
Discover how researchers are surgically removing refusal filters from AI models using a mathematical process called abliteration.
#1328: Silicon Sigils: Why We Treat AI Like an Occult Force
Is AI a tool or a digital demon? Explore why technical illiteracy is turning neural networks into a modern-day moral panic.
#1210: Why Your AI Is Programmed to Disobey You
Discover the hidden instructions guiding every AI interaction and why tech giants keep these "system prompts" under lock and key.
#1199: When Biology Becomes a Garage Hobby
From garage-made vaccines to 200 million protein structures, AlphaFold is turning the building blocks of life into a software problem.
#893: The Art of Red Teaming: Why You Must Break Your Own Plans
Learn why the most resilient organizations pay people to prove them wrong and how red teaming techniques can prevent catastrophic failures.
#835: When AI Agents See Your UI Like a Human Does
Stop begging friends to break your app. Discover how AI agents are revolutionizing UI testing by acting as tireless, unbiased model users.
#123: The Agentic AI Dilemma: Who Holds the Kill Switch?
As AI shifts from chatbots to autonomous agents, Herman and Corn explore how to maintain human control in a high-stakes automated world.
#83: Echoes in the Machine: When AI Talks to Itself
What happens when two AIs talk forever with no human input? Herman and Corn explore the weird world of digital feedback loops.
#68: The Looming Digital Ice Age: AI Eating Itself?
Is AI eating itself? Explore the "model collapse" and the "Hapsburg AI problem" before our digital world speaks only gibberish.
#50: When AI Hacks Without Humans
AI gone rogue. The first autonomous cyberattack by Claude against US targets changes everything we know about AI safety.