#fault-tolerance
35 episodes
#1067: The 3,000-Person Army: How Major AI Models Actually Ship
Think AI is built by a few geniuses? Discover the army of 3,000 specialists required to ship a single major model update.
#1048: The Keepers: How the Samaritans Outlasted Empires
Discover how a community of 950 people used ancient scripts and "survival engineering" to outlast empires for over two millennia.
#1041: Before the Hum: Life in the Pre-Refrigeration Era
Explore the high-stakes world of food preservation, from 19th-century ice trades to the biological secrets of 50-year-old perpetual stews.
#1036: The Kubernetes Tax: Scaling from Borg to AI Autopilot
Is Kubernetes too complex for most teams? Explore the evolution of infrastructure from Google’s Borg to the new era of AI-driven scaling.
#1032: Ancient Backups: How History Survived the Delete Command
Discover how ancient civilizations used monks, clay jars, and geographic diversity to create the world's first distributed data networks.
#1012: The Mach 24 Message: Inside the Minuteman III GT-255 Test
Explore the strategic signaling behind the GT-255 launch and why the U.S. relies on 50-year-old technology to maintain global security.
#989: Survival on the Edge: The Logistics of Polar Science
Beyond the ice: Explore the massive industrial operations and high-stakes geopolitics required to sustain human life at the Earth's poles.
#894: Iran After Khamenei: The IRGC’s Fight for Survival
Following the death of the Supreme Leader, we examine the IRGC’s grip on Iran’s economy, military, and its future as a "state within a state."
#893: The Art of Red Teaming: Why You Must Break Your Own Plans
Learn why the most resilient organizations pay people to prove them wrong and how red teaming techniques can prevent catastrophic failures.
#889: The Physics of Survival: Why AM Radio Beats 5G
When 5G fails in a concrete bunker, why is a $30 plastic radio your best hope? Discover the physics of why old tech beats the new.
#880: The UX of Survival: Engineering Modular Prep Kits
Discover the PMPU strategy: a modular approach to emergency gear that prioritizes tech, connectivity, and organization when every second counts.
#873: Bridging the Gap: The Tech Behind Emergency Dispatch
Discover how dispatchers bridge 1950s radio tech with modern satellites to save lives during critical "warm transfers" in real time.
#872: The Universal Lifeline: How Emergency Calls Really Work
Discover the invisible global protocols that allow your phone to call for help anywhere in the world—even without a SIM card or a plan.
#841: AI Gateways: Building Robust Infrastructure with LiteLLM
Discover how AI gateways like LiteLLM provide redundancy, caching, and unified tool access for scalable application development.
#777: The Multi-Monitor Edge: Why the Pros Shun Ultrawides
Explore why high-stakes professionals choose multi-screen arrays over trendy ultrawides for better focus, ergonomics, and reliability.
#771: Beyond Backups: The High Stakes of Critical Redundancy
How do hospitals and data centers stay online during a disaster? Explore the engineering of "five nines" and the limits of redundancy.
#764: Hardening the State: The Engineering of EMP Resistance
Explore the high-stakes engineering of military-grade shielding and how the state protects its "nervous system" from an electromagnetic pulse.
#762: Is Your Smart Home Too Fragile? The Decoupled Brain Fix
Tired of your smart home crashing? Discover why moving your home's "brain" to the cloud might be the ultimate reliability hack for your setup.
#740: The Limits of Flight: Logistics, Endurance, and Entropy
How long can a plane truly stay airborne? Explore the mechanical, human, and logistical limits of modern aerial power projection.
#728: The Plumbing of Data: From FAT32 to Self-Healing ZFS
Ever wonder how your data actually sits on a disk? Explore the evolution of file systems from the limits of FAT32 to the magic of ZFS.
#654: The Anatomy of Failure: Turning Blips into Breakthroughs
Stop burying your mistakes. Learn how to perform a "failure autopsy" using industrial frameworks to turn setbacks into a strategic advantage.
#642: From PC Building to Car Modding: DIY Electronics Guide
Think building a PC is hard? Try wiring a car. Herman and Corn explain how to upgrade your ride’s tech without frying the CAN bus.
#621: Designing for Failure: The Architecture of High Availability
Discover how the world’s biggest platforms stay online when hardware fails. Herman and Corn break down the invisible systems of high availability.
#620: ZFS Decoded: Recovering Data After Hardware Failure
Your motherboard fried, but is your data safe? Discover the secrets of ZFS portability, forced imports, and professional recovery workflows.
#527: Who’s Really Flying? The Evolution of Aircraft Controls
From steel cables to digital signals: Herman and Corn explore how flight controls evolved and why some modern jets still use 1960s technology.
#502: Bile, Babies, and Broke Kitchens: A Survival Guide
How do you stay healthy when life is a pressure cooker? Discover low-friction nutrition strategies for post-surgery recovery and high-stress life.
#493: Beyond the Magic Smoke: Predicting Hardware Failure
Learn how to spot motherboard degradation, track NVMe wear, and use hidden NVIDIA telemetry to save your data before the "magic smoke" escapes.
#458: The Wires That Bind: Decoding SCADA and Industrial Control
Ever wonder how the power grid stays balanced? Herman and Corn dive into SCADA, PLCs, and the tech keeping our modern world running.
#457: The Pager Paradox: Foolproof Emergency Alerts
Can your smartphone be trusted in a crisis? Explore why pagers and LoRa might be the ultimate "baby emergency" solution for parents.
#456: The Invisible Safety Net: The Science of Grounding
Why does your wall outlet have three prongs? Discover the hidden physics of electrical grounding and how buildings stay safe from power surges.
#454: Breaking the 16-Amp Ceiling: Israeli Electrical Secrets
Tired of the power tripping when you make toast? Herman and Corn explain the "16-amp ceiling" and how to modernize Israeli apartment wiring.
#438: The Rabbit in the Backyard: Decoding Airport Lighting
Discover the hidden engineering behind airport approach lights, from the "rabbit" flashers to the towers standing in suburban backyards.
#418: RAID is Not a Backup: Mastering Home Server Resilience
Why RAID isn’t enough and how snapshots act as a digital time machine for your home server’s survival.
#409: RAID Demystified: Speed, Safety, and Data Survival
Learn the math behind RAID levels, the risks of drive rebuilds, and why ZFS is the modern gold standard for data integrity.
#385: The Unkillable Workstation: Building for Total Redundancy
Can you build a PC that never dies? Herman and Corn explore redundant power, memory mirroring, and high-availability clusters for home servers.