← All Tags

#data-storage

32 episodes

#3431: How YouTube Stores 500 Hours of Video Every Minute

YouTube's videos are shredded, replicated across global servers, and stored at a cost approaching zero. Here's how.

data-storageinfrastructurehardware-reliability

#3217: When a Truck Beats the Internet: Shipping Data at Scale

Why FedEx sometimes beats fiber for moving massive datasets across the country.

data-integritylogisticsdata-storage

#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage

Cave paintings outlasted carved stone. Now engineers are using that chemistry to build千年-proof discs.

material-sciencedata-storagecave-painting

#2685: Plugin Data Storage for AI Agents

How to separate user data from plugin code across Linux, macOS, and Windows in agentic AI environments.

data-storageai-agentscross-platform

#2571: How S3 Billing Actually Works (And Why R2 Is Different)

Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.

cloud-computingdata-storagelatency

#2475: Docker Volumes: Why They Can't Move and What To Do

Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.

dockerbackup-strategiesdata-storage

#2465: JSON-L vs Parquet: When Each Format Wins

How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?

data-storagedata-integrityjsonl

#2438: The Folder Illusion: How Object Storage Fakes Hierarchy

Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.

data-storagecloud-computingdistributed-systems

#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations

Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...

ai-modelsdata-storageai-training

#2271: Vector Search in a Single File

What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...

vector-databasesedge-computingdata-storage

#2064: Why GPT-5 Is Stuck: The Data Wall Explained

The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.

large-language-modelsai-trainingdata-storage

#2011: Saving AI Knowledge Beyond the Chat Window

We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.

knowledge-managementai-agentsdata-storage

#2010: Building Better AI Memory Systems

We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.

ai-agentsragdata-storage

#1989: Your Cloud Photos Vanish If You Miss a $5 Bill

Is your data safe in the cloud, or is it one missed payment away from oblivion?

data-storagehome-labsupply-chain-security

#1988: The Eternal Storage That Can't Escape the Lab

Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?

data-storagehardware-engineeringglass-storage

#1983: Why Your Digital Photos Are Slowly Disappearing

Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.

data-storagedigital-forensicshardware-reliability

#1920: InfluxDB vs. Postgres: The Time-Series Showdown

We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.

data-storagedistributed-systemssoftware-development

#1910: Our Podcast Is Now a Permanent Research Artifact

Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.

open-sourcedata-storagedigital-forensics

#1797: Why the Cloud Runs on Cassette Tapes

The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.

data-storagehardware-engineeringsecurity

#1776: The Sync Trap: Why Your Backup Isn't Safe

Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.

data-storagedigital-privacyhuman-factors

#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model

Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.

cloud-computingdata-storagecloud-repatriation

#1233: Why "Just Use Postgres" Isn't Always Enough

Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.

data-storagearchitecturedistributed-systems

#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j

Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.

graph-databasesarchitecturedata-storage

#1124: The Database Explosion: Why One Size No Longer Fits All

From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.

vector-databasesdata-storageedge-computing

#1044: Ezra the Scribe: Architect of a Portable Identity

Discover how Ezra the Scribe transformed a nation’s identity from a physical temple to a portable text, shaping the modern world.

political-historylinguisticsdata-storage

#742: The Dark Archive: Saving Extremism for History

When mainstream sites delete toxic content, how do researchers save it? Explore the "memory hole" of digital hate speech and dark archives.

data-integritydata-storageosintdata-sovereigntydigital-preservation

#714: The Billion-Year Backup: Escaping the Digital Dark Age

Will our digital legacy survive for billions of years? Explore the tech fighting the "Digital Dark Age," from lunar libraries to quartz glass.

data-storagedigital-preservationspace-technology

#620: When ZFS Pools Survive Hardware Death

Your motherboard fried, but is your data safe? Discover the secrets of ZFS portability, forced imports, and professional recovery workflows.

data-integrityfault-tolerancedata-storage

#591: When Electrons Teleport: The Physics Limit of Storage

From floppy disks to 4TB cards, how much data can we squeeze onto a fingernail before physics pushes back? Explore the future of storage density.

data-storagehardware-engineeringsemiconductors

#589: Taming the Digital Landfill: Version Control for AI Media

When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.

software-developmentdata-storagetraining-datainfrastructureversion-control

#564: Beyond the Factory Reset: How to Truly Erase Your Data

Think a factory reset protects your old data? Herman and Corn reveal why your digital "ghosts" might still be lurking on your old devices.

data-storagedata-securityprivacye-wastedigital-forensics

#409: When RAID Fails: The Rebuild Time Nightmare

Learn the math behind RAID levels, the risks of drive rebuilds, and why ZFS is the modern gold standard for data integrity.

data-storagefault-tolerancedata-integrity