← All Tags

#ai-inference

6 episodes

#1556: Faster Than Thought: The Engineering Behind Real-Time AI

From KV cache monsters to sub-100ms response times, explore the hardware and software innovations making real-time AI a reality.

latencyai-inferencehardware-acceleration

#1479: The Speed of Thought: Inside the New Era of Inference

The war for model size is over. Explore the engineering breakthroughs making massive AI models faster than human thought.

ai-inferencelarge-language-modelsquantization

#671: Keys to the Kingdom: Securing AI Model Weights

How do AI labs share their models without losing the secret sauce? Explore the tech keeping Claude secure in the Pentagon’s hands.

ai-securityintellectual-propertyanthropicnational-securityai-inference

#484: The Silicon Sharing Economy: Inside Serverless GPUs

How do small teams run massive AI models without $50,000 chips? Corn and Herman dive into the hidden plumbing of serverless GPU providers.

cloud-computingai-inferencelatencygpu-accelerationinfrastructure

#48: AI Inference Decoded: The How & Where of AI Magic

Ever wonder how AI magic happens? We demystify AI inference, exploring where and how models truly operate.

ai-inferenceai-deploymentcloud-computingon-premisesdata-security

#38: AI Supercomputers: On Your Desk, Not Just The Cloud

AI supercomputers are landing on your desk! Discover why local AI is indispensable for enterprises facing API costs, latency, and privacy.

ai-supercomputerslocal-aiedge-computingai-inferenceai-training