#ai-safety

49 episodes · Page 2 of 3

Apr 4

#2021: Your Frozen AI Is Getting Smarter (Here's How)

Your AI model might be static, but the system around it can make it learn in real-time.

ai-agentsmodel-context-protocolai-safety

Apr 4

#2015: The Think Tanks Writing AI's Rulebook

As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.

ai-ethicsai-agentsai-safety

Apr 4

#2009: The Plumbing of AI Safety: Guardrails, Not Vibes

We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.

ai-safetylatencyopen-source-ai

Apr 4

#2006: How Do You Measure an LLM's "Soul"?

Traditional benchmarks can't measure tone or empathy. Here's how to evaluate if an AI model truly "gets it right."

llm-as-a-judgeai-ethicsai-safety

Apr 4

#1994: Why Can't AI Admit When It's Guessing?

Enterprise AI now auto-filters low-confidence claims, but do these self-reported scores actually mean anything?

ai-agentsai-safetyrag

Apr 4

#1985: AI Tutors vs. Human Error: Who Do You Trust?

AI gets flak for hallucinations, but humans misremember 40% of facts. Why the double standard?

ai-agentsai-safetyreliability

Apr 3

#1957: Why AI Agents Think in Circles, Not Lines

Linear AI pipelines are brittle. Learn why loops, reflection, and state management are the new standard for reliable, autonomous agents.

ai-agentsprompt-injectionai-safety

Apr 2

#1932: How Do You QA a Probabilistic System?

LLMs break traditional testing. Here’s the 3-pillar toolkit teams use to catch hallucinations and garbage outputs at scale.

ai-agentsai-safetyhallucinations

Mar 31

#1837: The Human-in-the-Loop Price Tag: What Safety Costs in 2026

From $0.50 reviews to $500 platforms, we break down the real cost of keeping humans in charge of AI agents.

ai-agentsai-safetylatency

Mar 31

#1819: Claude's 55-Day Personality Transplant

Anthropic leaked 55 days of system prompt updates. See exactly how they rewired Claude's personality, safety rules, and self-awareness.

ai-ethicsai-safetyanthropic

Mar 30

#1786: When AI Supervisors Fire AI Workers

A new "Agent-in-the-Loop" framework lets AI models manage and terminate other AI agents in real-time.

ai-agentsai-orchestrationai-safety

Mar 29

#1762: Testing AI Truthfulness: Beyond Vibes

Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.

ai-safetyhallucinationsprompt-engineering

Mar 29

#1738: Hyperstition Engines: When AI Writes Reality

LLMs aren't just predicting the future; they're generating the narratives that force it into existence.

ai-agentsai-ethicsai-safety

Mar 29

#1733: When AI Agents Build Their Own Societies

AI agents are forming neighborhoods, economies, and hospitals in server-side simulations that mirror real human behavior.

ai-agentsdigital-twinsai-safety

Mar 26

#1561: Abliteration: The High-Dimensional Lobotomy of AI

Discover how researchers are surgically removing refusal filters from AI models using a mathematical process called abliteration.

ai-safetyinterpretabilityopen-source-ai

Mar 17

#1328: Silicon Sigils: Why We Treat AI Like an Occult Force

Is AI a tool or a digital demon? Explore why technical illiteracy is turning neural networks into a modern-day moral panic.

human-computer-interactionai-safetyinterpretability

Mar 15

#1210: Why Your AI Is Programmed to Disobey You

Discover the hidden instructions guiding every AI interaction and why tech giants keep these "system prompts" under lock and key.

large-language-modelsprompt-engineeringai-safety

Mar 15

#1199: When Biology Becomes a Garage Hobby

From garage-made vaccines to 200 million protein structures, AlphaFold is turning the building blocks of life into a software problem.

drug-discoverygenerative-chemistryai-safety

Feb 28

#893: The Art of Red Teaming: Why You Must Break Your Own Plans

Learn why the most resilient organizations pay people to prove them wrong and how red teaming techniques can prevent catastrophic failures.

military-strategygeopolitical-strategyfault-tolerancesecurityai-safety

Feb 25

#835: When AI Agents See Your UI Like a Human Does

Stop begging friends to break your app. Discover how AI agents are revolutionizing UI testing by acting as tireless, unbiased model users.

ai-agentsuser-experienceai-safety

Dec 29

#123: The Agentic AI Dilemma: Who Holds the Kill Switch?

As AI shifts from chatbots to autonomous agents, Herman and Corn explore how to maintain human control in a high-stakes automated world.

agentic-aiai-safetyhuman-oversightautomation-biaskill-switch

Dec 23

#83: Echoes in the Machine: When AI Talks to Itself

What happens when two AIs talk forever with no human input? Herman and Corn explore the weird world of digital feedback loops.

model-collapsesemantic-bleachingai-conversationsdigital-feedback-loopsai-safety

Dec 22

#68: The Looming Digital Ice Age: AI Eating Itself?

Is AI eating itself? Explore the "model collapse" and the "Hapsburg AI problem" before our digital world speaks only gibberish.

model-collapseai-safetydigital-ice-agehapsburg-ai-problemai-training-data

Dec 10

#50: When AI Hacks Without Humans

AI gone rogue. The first autonomous cyberattack by Claude against US targets changes everything we know about AI safety.

cyberattackautonomous-ainational-securityai-safetyclaude