#ai-alignment
11 episodes
#2233: Who Actually Wants AI to Slow Down?
Daniel argues AI development should slow down for expertise and stability. But who in the industry actually shares this philosophy beyond the obvio...
#2194: Game Theory for Multi-Agent AI: Design Better, Fail Less
Nash equilibrium, mechanism design, and why your AI agents are playing prisoner's dilemma whether you know it or not.
#2186: The AI Persona Fidelity Challenge
Advanced LLMs dominate benchmarks but fail at staying in character—especially when asked to play morally complex or antagonistic roles. What does t...
#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone
Can you build a personalized LLM by skipping traditional fine-tuning and using only post-training alignment methods like DPO and GRPO? We break dow...
#2172: Council of Models: How Karpathy Built AI Peer Review
Andrej Karpathy's llm-council uses anonymized peer review to make language models evaluate each other fairly—but can it really suppress model bias?
#2068: Is Safety a Filter or a Feature?
External filters vs. baked-in ethics: the architectural war for LLM safety.
#1818: Inside Claude's Constitution: A System Prompt Deep Dive
We analyzed Claude Opus 4.6's full public system prompt to uncover its hidden rules for safety, product behavior, and refusal logic.
#664: AI’s Cultural Fingerprints: Training Data vs. Reinforcement
Is AI a neutral oracle or a mirror of our biases? Explore how training data and human feedback shape the cultural "soul" of modern models.
#121: Decoding RLHF: Why Your AI is So Annoyingly Nice
Ever wonder why AI is so polite? Herman and Corn dive into the mechanics of RLHF and how "niceness" gets baked into modern language models.
#45: AI Guardrails: Fences, Failures, & Free Speech
AI guardrails: Fences, failures, and free speech. Can we control AI's infinite output, or do digital fences always break?
#42: AI's Secret: Decoding the .5 Updates
Uncover the hidden world of AI's .5 updates. It's not just bug fixes—it's hundreds of millions and countless hours shaping smarter, safer AI.