#ai-inference
6 episodes
#1556: Faster Than Thought: The Engineering Behind Real-Time AI
From KV cache monsters to sub-100ms response times, explore the hardware and software innovations making real-time AI a reality.
#1479: The Speed of Thought: Inside the New Era of Inference
The war for model size is over. Explore the engineering breakthroughs making massive AI models faster than human thought.
#671: Keys to the Kingdom: Securing AI Model Weights
How do AI labs share their models without losing the secret sauce? Explore the tech keeping Claude secure in the Pentagon’s hands.
#484: The Silicon Sharing Economy: Inside Serverless GPUs
How do small teams run massive AI models without $50,000 chips? Corn and Herman dive into the hidden plumbing of serverless GPU providers.
#48: AI Inference Decoded: The How & Where of AI Magic
Ever wonder how AI magic happens? We demystify AI inference, exploring where and how models truly operate.
#38: AI Supercomputers: On Your Desk, Not Just The Cloud
AI supercomputers are landing on your desk! Discover why local AI is indispensable for enterprises facing API costs, latency, and privacy.