AI Safety & Ethics

Security, alignment, and responsible AI

64 episodes · Page 2 of 3

#2482: When AI Chatbots Leak Your PDFs via Public S3 Buckets

A user uploaded a sensitive PDF to an AI chatbot. The chatbot stored it in a public S3 bucket with zero authentication.

data-securityai-securitycloud-computing

#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering

PII detection at the gateway layer can block legitimate invoices. Here's how guardrails actually work and where they fail.

ai-securitylatencyprompt-injection

#2430: Where Men's Advocacy Crosses Into Misogyny

How to acknowledge real male grievances without falling into the manosphere's woman-hating fringe.

misinformationextremismsocial-engineering

#2424: What Feminists Actually Mean by "The Patriarchy

Unpacking the structural concept, the popular shorthand, and where the line gets blurry between critiquing systems and demonizing individuals.

cultural-biasmisinformationfree-speech

#2413: When Your AI Says No to Everything

Why LLMs refuse 73% of harmless prompts — and the trade-off between safety and usefulness.

ai-safetyai-alignmentprompt-engineering

#2412: When AI Caves: Progressive vs. Regressive Sycophancy

Why do LLMs agree with you even when you're wrong? We break down the SycEval benchmark and the 78% persistence problem.

ai-safetyai-alignmenthallucinations

#2411: Are Political Bias Benchmarks Actually Measuring Anything?

Why the Political Compass Test fails, and what researchers are building instead to actually measure model bias.

ai-ethicscultural-biasbenchmarks

#2410: How Researchers Actually Measure Censorship in Chinese LLMs

Beyond headlines: the actual benchmarks, methodologies, and pitfalls in detecting political refusal in Chinese language models.

large-language-modelsai-safetycultural-bias

#2409: When AI Cheats on Cultural Knowledge

Five benchmarks that reveal how AI systems fail at cultural knowledge — and what their methodologies tell us.

cultural-biasbenchmarksmultimodal-ai

#2407: Three Landings in 90 Days: Pilot Automation Dependency

Why pilots aren't hand-flying enough, the regulatory floor that lets it happen, and what airlines are doing about it.

aviation-technologyhuman-factorssituational-awareness

#2383: The Blame Gap: Public Anger vs. Breach Reality

How much blame do companies deserve for data breaches? The answer isn't as simple as you think.

cybersecuritydata-securitydigital-privacy

#2372: Choosing the Right Sandbox for Your Threat Model

Explore the tools and methods for creating secure, isolated environments to test malware, browse privately, and protect sensitive systems.

cybersecurityprivacyoperating-systems

#2250: How Incentives Shape AI Safety Research

Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...

ai-safetyai-alignmentanthropic

#2246: Constitutional AI: Anthropic's Theory of Safe Scaling

How Anthropic's Constitutional AI replaces human raters with AI self-critique guided by explicit principles—and what it assumes about the future of...

anthropicai-safetyai-alignment

#2190: Simulating Extreme Decisions With LLMs

LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.

large-language-modelsai-safetyhallucinations

#2186: The AI Persona Fidelity Challenge

Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...

ai-safetyai-alignmenthallucinations

#2180: The Sandboxing Tradeoff in Agent Design

AI agents need broad permissions to be useful—but every permission expands the attack surface. We map the real threat landscape and the isolation t...

ai-agentsai-securityprompt-injection

#2134: The Fog-of-War Problem in AI Wargaming

Why shared AI brains make secret-keeping a nightmare, and the four architectural patterns researchers use to fix it.

ai-agentsmilitary-strategydata-integrity

#2102: Why Don't You Notice AI Security Delays?

Multi-layer security checks add latency, but modern CLIs hide it under 100ms using parallelization and speculation.

ai-agentslatencycybersecurity

#2068: Is Safety a Filter or a Feature?

External filters vs. baked-in ethics: the architectural war for LLM safety.

ai-safetyai-ethicsai-alignment

#2045: Anonymity Isn't the Problem, The Architecture Is

Why does Reddit amplify toxicity while other anonymous spaces stay healthy? It's not the mask—it's the room's shape.

digital-privacysocial-engineeringhuman-computer-interaction

#2029: ADHD Brains: Why Willpower Fails & How to Hack It

Stop blaming yourself for half-used planners. Here’s the neurobiology behind ADHD time management.

adhdneuroscienceexecutive-function

#2015: The Think Tanks Writing AI's Rulebook

As the EU AI Act takes hold, we spotlight the key think tanks shaping global AI policy, safety, and ethics.

ai-ethicsai-agentsai-safety

#2009: The Plumbing of AI Safety: Guardrails, Not Vibes

We dive deep into the specific libraries, proxy layers, and architectural decisions that keep an LLM from emptying a bank account.

ai-safetylatencyopen-source-ai