AI Safety & Ethics

Guardrails & Alignment

Safety measures, content filtering, red-teaming

26 episodes

#3724: How the Pope's Letter on AI Actually Works

Unpacking the Pope’s new encyclical on AI: what it is, how Catholics interpret it, and why it matters beyond the Church.

ai-ethicsgenerative-aiinternational-relations

#3658: How Reddit Built Guardrails for Anonymity

Reddit didn't solve harassment by killing anonymity. It built friction, reputation systems, and distributed governance.

social-engineeringcontent-provenanceonline-privacy

#3422: How Rival Labs Reverse-Engineer a New AI Model in Hours

Inside the organized frenzy when a closed-source model drops — and how competitors map its every weakness.

ai-agentsai-securityprompt-injection

#3209: When Algorithms Become Censors

How SLAPP suits, libel tourism, and Google's algorithm chill journalism more effectively than any law.

free-speechmisinformationsocial-engineering

#2909: The Reassurance Mirage: When Moderation Fails

How the EU Digital Services Act exposes a 30-to-1 gap in appeal success rates between platforms.

ai-ethicscontent-provenancemisinformation

#2808: Falling for Your Chatbot: Love, Loss, and Language Models

Real cases of people falling in love with AI companions, why memory makes it feel real, and what happens when the illusion breaks.

ai-ethicsconversational-aiai-memory

#2558: Should You Say Please to AI?

The surprising cost, technical tradeoffs, and ethical dilemmas of saying "please" to chatbots.

ai-ethicsprompt-engineeringhuman-computer-interaction

#2526: How Peer Review Actually Works (and Fails)

The history of peer review, the Lancet's biggest scandals, and why arXiv is changing everything.

misinformationopen-sourcemedical-history

#2518: How Jailbreaking Reveals AI's Hidden Tension

What the DAN prompt and grandma exploits reveal about the structural conflict inside every LLM.

prompt-engineeringai-safetyai-alignment

#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering

PII detection at the gateway layer can block legitimate invoices. Here's how guardrails actually work and where they fail.

ai-securitylatencyprompt-injection

#2413: When Your AI Says No to Everything

Why LLMs refuse 73% of harmless prompts — and the trade-off between safety and usefulness.

ai-safetyai-alignmentprompt-engineering

#2412: When AI Caves: Progressive vs. Regressive Sycophancy

Why do LLMs agree with you even when you're wrong? We break down the SycEval benchmark and the 78% persistence problem.

ai-safetyai-alignmenthallucinations

#2410: How Researchers Actually Measure Censorship in Chinese LLMs

Beyond headlines: the actual benchmarks, methodologies, and pitfalls in detecting political refusal in Chinese language models.

large-language-modelsai-safetycultural-bias

#2407: Three Landings in 90 Days: Pilot Automation Dependency

Why pilots aren't hand-flying enough, the regulatory floor that lets it happen, and what airlines are doing about it.

aviation-technologyhuman-factorssituational-awareness

#2250: How Incentives Shape AI Safety Research

Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...

ai-safetyai-alignmentanthropic

#2246: Constitutional AI: Anthropic's Theory of Safe Scaling

How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...

anthropicai-safetyai-alignment

#2190: Simulating Extreme Decisions With LLMs

LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.

large-language-modelsai-safetyhallucinations

#2186: The AI Persona Fidelity Challenge

Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...

ai-safetyai-alignmenthallucinations

#2068: Is Safety a Filter or a Feature?

External filters vs. baked-in ethics: the architectural war for LLM safety.

ai-safetyai-ethicsai-alignment

#2045: Anonymity Isn't the Problem, The Architecture Is

Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.

digital-privacysocial-engineeringhuman-computer-interaction

#2029: ADHD Brains: Why Willpower Fails & How to Hack It

Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.

adhdneuroscienceexecutive-function

#2015: The Think Tanks Writing AI's Rulebook

As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.

ai-ethicsai-agentsai-safety

#2009: The Plumbing of AI Safety: Guardrails, Not Vibes

We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.

ai-safetylatencyopen-source-ai

#1996: Why Leaders Broadcast Victory While Citizens Hear Sirens

A gap opens between official statements and reality, as curated videos clash with live data streams.

geopolitical-strategynarrative-dissonancepublic-trust

#1803: Why Hostages Defend Their Captors

A tech exec was brainwashed in 2025. The neurochemistry is the same as Stockholm Syndrome.

neurosciencepsychopharmacologysocial-engineering

#1712: Five AIs, One Question: A Tiananmen Square Test

We asked five AI models the same question about Tiananmen Square. Their answers reveal a stark divide between Chinese and Western AI.

ai-ethicsgeopoliticsai-censorship