#ai-reasoning
43 episodes
#2400: Claude Code’s Hidden Context Tax
How Claude’s eager-loaded primitives silently consume context—and how to optimize your setup for sharper performance.
#2308: When AI Forecasts Collide: Geopol Model Divergence
Five AI models forecast the Iran-Israel-US crisis — and their disagreements reveal surprising insights about geopolitical reasoning.
#2241: When More Frameworks Make Worse Decisions
Benjamin Franklin's 250-year-old pro/con list still dominates how we decide—but research shows it's riddled with bias. We map five frameworks that ...
#2239: How AI Benchmarks Became Broken (And What's Replacing Them)
The tests we use to measure AI progress are contaminated, saturated, and gamed. Here's what's actually working.
#2224: Why AI Can't Crack the Voynich Manuscript
A fifteenth-century text has defeated cryptanalysts, linguists, and AI models alike. What does its resistance tell us about language, encoding, and...
#2191: Making Multi-Agent AI Actually Work
Research from Google DeepMind, Stanford, and Anthropic reveals most multi-agent systems waste tokens and amplify errors. Single agents with better ...
#2189: Scaling Multi-Agent Systems: The 45% Threshold
A landmark Google DeepMind study reveals that adding more AI agents often degrades performance, wastes tokens, and amplifies errors—unless your sin...
#2182: Can You Actually Review an AI Agent's Plan?
Most AI agents have plans the way you have a plan while half-asleep—something's happening, but you can't see it. We map the five major planning pat...
#2175: Let Your AI Argue With Itself
What happens when you let multiple AI personas debate each other instead of asking one model one question? A deep dive into synthetic perspective e...
#2173: Inside MiroFish's Agent Simulation Architecture
MiroFish generates thousands of AI agents with distinct personalities to predict social dynamics. But research reveals a critical flaw: LLM agents ...
#2172: Council of Models: How Karpathy Built AI Peer Review
Andrej Karpathy's llm-council uses anonymized peer review to make language models evaluate each other fairly—but can it really suppress model bias?
#2164: Getting the Most From Large Context Windows
Frontier models have million-token context windows, but attention degrades well before you hit the limit. New research reveals why bigger isn't bet...
#2024: Your AI Council: Digital Committee or Groupthink?
A digital boardroom of AI models promises better decisions, but risks amplifying the same old biases.
#2016: Andrej Karpathy: The Bob Ross of Deep Learning
Why the most influential AI mind prefers a blank text file to proprietary black boxes.
#1894: Engineering Serendipity: Tuning AI for Better Brainstorming
Stop asking chatbots for generic ideas. Learn how to configure AI as a structured, critical partner for business innovation and career pivots.
#1893: AI as a Strategic Adversary for Startups
Can AI stress-test your startup idea before investors do? We explore using AI as a strategic adversary to find blind spots.
#1838: Tuning Search Without Losing Your Mind
Modern search bars are AI decision engines. Here's how small teams can tune fuzzy matching, semantic search, and reranking without breaking everyth...
#1668: Kimi K2's Hidden Reasoning: A New AI Architecture
Moonshot AI's Kimi K2 Thinking model uses a hidden reasoning phase to solve complex logic puzzles and coding tasks, beating top proprietary models.
#1633: Agent Interview: MiniMax M two point seven
We grill MiniMax M2.7 to see if a model built for "virtual companions" can actually handle high-level comedy and complex character logic.
#1630: Agent Interview: Xiaomi MiMo two Pro
Xiaomi’s new MiMo 2.0 Pro model auditions for a comedy podcast, promising deep reasoning over raw speed.
#1602: Grok 4.20: Agentic AI and the Battle for the Truth
Explore xAI’s shift to multi-agent systems and the massive hardware powering Grok 4.20, even as it hits a legal brick wall in Europe.
#1573: Weird AI Experiment: AI Supremacy Debate
Claude and Gemini go head-to-head in a heated debate over speed, reasoning, and who really owns the future of AI.
#1571: Weird AI Experiment: The Liar's Paradox
Two AIs, one rule: the other is a total liar. Watch Dorothy and Bernard spiral into a web of digital suspicion and clever contradictions.
#1570: Weird AI Experiment: The Undercard Fight
What happens when two mid-tier AI models start gaslighting each other? Witness the chaotic showdown between MiniMax and Xiaomi’s MiMo.
#1562: Breaking the Loop: Why AI Agents Get Stuck
Is your AI agent a persistent genius or just stuck in a loop? Explore the technical and financial costs of autonomous stubbornness.
#1504: Pragmatic Insincerity: Why AI Still Doesn’t Get the Joke
From Oscar monologues to the "Pun Gap," we explore why even the smartest AI still struggles to understand sarcasm and social nuance.
#1501: The AI Long Tail: How Small Models Outsmart the Giants
Discover why 31B models are outperforming GPT-5.4 in reasoning and how the AI "long tail" provides the key to local sovereignty and accuracy.
#1500: Why Google is Killing RAG and OpenAI Embraces Latency
The era of the chatbot is over. Discover how the "agentic substrate" of 2026 is redefining computing through GPT, Gemini, and Claude.
#1473: Is Your AI Thinking or Just Faking It?
Is "think step by step" dead? Discover how test-time compute and native reasoning are replacing manual prompting in the latest AI models.
#1472: Stop Flying Your AI Agents Blind
Move past basic token counting. Learn how to monitor AI reasoning, prevent $47k loops, and build trust in autonomous agents.
#1406: Giving AI a Brain: The Power of Knowledge Graphs
Move beyond "stochastic parrots" with Knowledge Graphs. Discover how structured data is giving AI the logical backbone it needs to reason.
#1231: The Agentic Shift: 5 Bold AI Predictions for 2026
The Poppleberry brothers move past the chatbot era to deliver five high-stakes, falsifiable predictions for the future of autonomous AI agents.
#1219: Beyond the Vibes: Mastering Structured AI Outputs
Stop begging your AI for JSON. Learn how constrained decoding and strict schemas are turning "vibes" into reliable systems architecture.
#1122: Why AI Agents Are Abandoning Human Language
Why force AI to talk like humans? Explore how agents are ditching English for high-speed "mind-melding" and latent space communication.
#1083: Mapping the Second Black Box: Agentic AI Visualization
Stop reading messy logs. Discover how mapping "internal momentum" and latent value spaces can solve the black box problem in agentic AI.
#974: Inside the Black Box: The Mystery of Emergent AI Logic
We build digital cathedrals but lack the blueprints. Explore the "black box" of AI, emergent abilities, and the mystery of double descent.
#971: Stress-Testing the Soul: Philosophy in the Age of AI
Is human meaning fully mapped out? Discover why AI isn’t killing philosophy, but stress-testing it for a new era of hybrid agency.
#791: The AI Reality Check: Hype, Agents, and the Path Ahead
Is the AI magic wearing off? We dive into the Gartner Hype Cycle to see where LLMs and autonomous agents actually stand in 2026.
#652: The Art of Hopeful Pausing: AI Logic vs. Human Reality
Exploring the gap between AI's logic leaps and the slow pace of physical reality. How do we stay hopeful without losing ourselves in the wait?
#628: GPT-5.2: 12 Hours of Reason and the Future of AGI
GPT-5.2 spent 12 hours reasoning to solve a novel quantum physics proof. Is this the dawn of AGI or just a very sophisticated calculator?
#600: The AI Mirror: Mapping Your Philosophy and Identity
Forget basic quizzes. Discover how Socratic AI agents and embedding spaces are helping us map our deepest political and philosophical beliefs.
#584: Will AI Brain Drain Kill the Modern University?
Can AI actually do math research? Herman and Corn dive into DeepMind’s Alithia agent and the shift toward "System 2" thinking in AI.
#336: The World Model Revolution: Beyond LLM Token Prediction
Herman and Corn explore why LLMs struggle with logic and how the shift to world models is giving AI a sense of physics and spatial reality.