#ai-agents
344 episodes · Page 4 of 15
#2195: Nash's Real Genius (And Why the Movie Got It Wrong)
The bar scene in A Beautiful Mind is mathematically wrong—and it obscures Nash's actual breakthrough. We trace the real ideas from his 1950 papers ...
#2194: Game Theory for Multi-Agent AI: Design Better, Fail Less
Nash equilibrium, mechanism design, and why your AI agents are playing prisoner's dilemma whether you know it or not.
#2191: Making Multi-Agent AI Actually Work
Research from Google DeepMind, Stanford, and Anthropic reveals most multi-agent systems waste tokens and amplify errors. Single agents with better ...
#2189: Scaling Multi-Agent Systems: The 45% Threshold
A landmark Google DeepMind study reveals that adding more AI agents often degrades performance, wastes tokens, and amplifies errors—unless your sin...
#2185: Taking AI Agents From Demo to Production
Sixty-two percent of companies are experimenting with AI agents, but only 23% are scaling them—and 40% of projects will be canceled by 2027. The ga...
#2184: The Economics of Running AI Agents
Production AI agents can cost $500K/month before optimization. Learn model routing, prompt caching, and token budgeting to cut costs 40-85% without...
#2182: Can You Actually Review an AI Agent's Plan?
Most AI agents have plans the way you have a plan while half-asleep—something's happening, but you can't see it. We map the five major planning pat...
#2181: When RAG Becomes an Agent
RAG in chatbots is simple retrieval. RAG in agents is a multi-step decision loop. Here's what actually changes.
#2180: The Sandboxing Tradeoff in Agent Design
AI agents need broad permissions to be useful—but every permission expands the attack surface. We map the real threat landscape and the isolation t...
#2179: Building Cost-Resilient AI Agents
Failed API calls in agent loops aren't just technical problems—they're direct budget drains. Here's how checkpointing, retry strategies, and cachin...
#2178: How to Actually Evaluate AI Agents
Frontier models score 80% on one agent benchmark and 45% on another. The difference isn't the model—it's contamination, scaffolding, and how the te...
#2174: Role-Playing as Orchestration
How a role-playing protocol from NeurIPS 2023 became one of AI's most underrated agent frameworks—and what happens when you scale it to a million a...
#2173: Inside MiroFish's Agent Simulation Architecture
MiroFish generates thousands of AI agents with distinct personalities to predict social dynamics. But research reveals a critical flaw: LLM agents ...
#2171: How IQT Labs Built a Wargaming LLM (Then Archived It)
A deep code review of Snowglobe, IQT Labs' open-source LLM wargaming system that ran real national security simulations before being archived. What...
#2170: Pricing Agentic AI When Nothing's Predictable
How do you charge fixed prices for systems that operate in fundamental uncertainty? Consultants are discovering frameworks that work—but they requi...
#2169: How Enterprises Are Rethinking Agent Frameworks
Twelve major agentic AI frameworks exist—yet many serious developers avoid them entirely. What patterns emerge in real enterprise adoption?
#2168: What Serious Agentic AI Developers Actually Need to Know
Python, TypeScript, LangGraph, and the frameworks reshaping how agents work. A technical map of the skills and concepts that separate prototypes fr...
#2167: Sync vs. Async: Architecting Agents for Scale
Why most enterprise AI agents fail in production has less to do with models and more to do with whether they're built synchronously or asynchronously.
#2166: Code vs. Canvas: How Developers Pick Their Tools
LangGraph or Flowise? The honest answer isn't obvious. Developers gain speed and integrations with visual builders—but lose version control, testin...
#2165: Strip Your Agent to Bash
The frameworks matter less than you think. What separates a working agent from a failing one is the harness—the orchestration, memory, and tool des...
#2163: Designing Autonomy Boundaries for AI Agents
Production data reveals a surprising truth: fully autonomous AI agents waste 98% of their context window on tool descriptions. Here's why the indus...
#2158: Claude Managed Agents: Brain Versus Hands
Anthropic's new Managed Agents service runs your agent loop on their infrastructure. Here's what you gain, what you lose, and who it's actually for.
#2146: The AI Wargame's Flat Hierarchy Problem
AI wargames treat NGOs and nuclear powers as equals. That's a dangerous flaw for real-world policy planning.
#2144: AI Wargaming: One Model or Many?
Should geopolitical AI simulations use one model or many? We debate the pros and cons of a single-model approach.