#data-storage
32 episodes
#3431: How YouTube Stores 500 Hours of Video Every Minute
YouTube's videos are shredded, replicated across global servers, and stored at a cost approaching zero. Here's how.
#3217: When a Truck Beats the Internet: Shipping Data at Scale
Why FedEx sometimes beats fiber for moving massive datasets across the country.
#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage
Cave paintings outlasted carved stone. Now engineers are using that chemistry to build千年-proof discs.
#2685: Plugin Data Storage for AI Agents
How to separate user data from plugin code across Linux, macOS, and Windows in agentic AI environments.
#2571: How S3 Billing Actually Works (And Why R2 Is Different)
Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.
#2475: Docker Volumes: Why They Can't Move and What To Do
Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.
#2465: JSON-L vs Parquet: When Each Format Wins
How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?
#2438: The Folder Illusion: How Object Storage Fakes Hierarchy
Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.
#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations
Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...
#2271: Vector Search in a Single File
What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...
#2064: Why GPT-5 Is Stuck: The Data Wall Explained
The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.
#2011: Saving AI Knowledge Beyond the Chat Window
We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.
#2010: Building Better AI Memory Systems
We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.
#1989: Your Cloud Photos Vanish If You Miss a $5 Bill
Is your data safe in the cloud, or is it one missed payment away from oblivion?
#1988: The Eternal Storage That Can't Escape the Lab
Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?
#1983: Why Your Digital Photos Are Slowly Disappearing
Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.
#1920: InfluxDB vs. Postgres: The Time-Series Showdown
We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.
#1910: Our Podcast Is Now a Permanent Research Artifact
Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.
#1797: Why the Cloud Runs on Cassette Tapes
The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.
#1776: The Sync Trap: Why Your Backup Isn't Safe
Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.
#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model
Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.
#1233: Why "Just Use Postgres" Isn't Always Enough
Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.
#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j
Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.
#1124: The Database Explosion: Why One Size No Longer Fits All
From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.
#1044: Ezra the Scribe: Architect of a Portable Identity
Discover how Ezra the Scribe transformed a nation’s identity from a physical temple to a portable text, shaping the modern world.
#742: The Dark Archive: Saving Extremism for History
When mainstream sites delete toxic content, how do researchers save it? Explore the "memory hole" of digital hate speech and dark archives.
#714: The Billion-Year Backup: Escaping the Digital Dark Age
Will our digital legacy survive for billions of years? Explore the tech fighting the "Digital Dark Age," from lunar libraries to quartz glass.
#620: When ZFS Pools Survive Hardware Death
Your motherboard fried, but is your data safe? Discover the secrets of ZFS portability, forced imports, and professional recovery workflows.
#591: When Electrons Teleport: The Physics Limit of Storage
From floppy disks to 4TB cards, how much data can we squeeze onto a fingernail before physics pushes back? Explore the future of storage density.
#589: Taming the Digital Landfill: Version Control for AI Media
When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.
#564: Beyond the Factory Reset: How to Truly Erase Your Data
Think a factory reset protects your old data? Herman and Corn reveal why your digital "ghosts" might still be lurking on your old devices.
#409: When RAID Fails: The Rebuild Time Nightmare
Learn the math behind RAID levels, the risks of drive rebuilds, and why ZFS is the modern gold standard for data integrity.