#data-storage

32 episodes

Jun 10

#3431: How YouTube Stores 500 Hours of Video Every Minute

YouTube's videos are shredded, replicated across global servers, and stored at a cost approaching zero. Here's how.

data-storageinfrastructurehardware-reliability

Jun 2

#3217: When a Truck Beats the Internet: Shipping Data at Scale

Why FedEx sometimes beats fiber for moving massive datasets across the country.

data-integritylogisticsdata-storage

May 25

#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage

Cave paintings outlasted carved stone. Now engineers are using that chemistry to build千年-proof discs.

material-sciencedata-storagecave-painting

May 7

#2685: Plugin Data Storage for AI Agents

How to separate user data from plugin code across Linux, macOS, and Windows in agentic AI environments.

data-storageai-agentscross-platform

May 1

#2571: How S3 Billing Actually Works (And Why R2 Is Different)

Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.

cloud-computingdata-storagelatency

Apr 27

#2475: Docker Volumes: Why They Can't Move and What To Do

Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.

dockerbackup-strategiesdata-storage

Apr 26

#2465: JSON-L vs Parquet: When Each Format Wins

How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?

data-storagedata-integrityjsonl

Apr 26

#2438: The Folder Illusion: How Object Storage Fakes Hierarchy

Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.

data-storagecloud-computingdistributed-systems

Apr 21

#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations

Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...

ai-modelsdata-storageai-training

Apr 17

#2271: Vector Search in a Single File

What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...

vector-databasesedge-computingdata-storage

Apr 6

#2064: Why GPT-5 Is Stuck: The Data Wall Explained

The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.

large-language-modelsai-trainingdata-storage

Apr 4

#2011: Saving AI Knowledge Beyond the Chat Window

We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.

knowledge-managementai-agentsdata-storage

Apr 4

#2010: Building Better AI Memory Systems

We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.

ai-agentsragdata-storage

Apr 4

#1989: Your Cloud Photos Vanish If You Miss a $5 Bill

Is your data safe in the cloud, or is it one missed payment away from oblivion?

data-storagehome-labsupply-chain-security

Apr 4

#1988: The Eternal Storage That Can't Escape the Lab

Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?

data-storagehardware-engineeringglass-storage

Apr 4

#1983: Why Your Digital Photos Are Slowly Disappearing

Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.

data-storagedigital-forensicshardware-reliability

Apr 2

#1920: InfluxDB vs. Postgres: The Time-Series Showdown

We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.

data-storagedistributed-systemssoftware-development

Apr 2

#1910: Our Podcast Is Now a Permanent Research Artifact

Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.

open-sourcedata-storagedigital-forensics

Mar 31

#1797: Why the Cloud Runs on Cassette Tapes

The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.

data-storagehardware-engineeringsecurity

Mar 30

#1776: The Sync Trap: Why Your Backup Isn't Safe

Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.

data-storagedigital-privacyhuman-factors

Mar 23

#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model

Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.

cloud-computingdata-storagecloud-repatriation

Mar 15

#1233: Why "Just Use Postgres" Isn't Always Enough

Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.

data-storagearchitecturedistributed-systems

Mar 15

#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j

Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.

graph-databasesarchitecturedata-storage

Mar 12

#1124: The Database Explosion: Why One Size No Longer Fits All

From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.

vector-databasesdata-storageedge-computing

Mar 8

#1044: Ezra the Scribe: Architect of a Portable Identity

Discover how Ezra the Scribe transformed a nation’s identity from a physical temple to a portable text, shaping the modern world.

political-historylinguisticsdata-storage

Feb 21

#742: The Dark Archive: Saving Extremism for History

When mainstream sites delete toxic content, how do researchers save it? Explore the "memory hole" of digital hate speech and dark archives.

data-integritydata-storageosintdata-sovereigntydigital-preservation

Feb 19

#714: The Billion-Year Backup: Escaping the Digital Dark Age

Will our digital legacy survive for billions of years? Explore the tech fighting the "Digital Dark Age," from lunar libraries to quartz glass.

data-storagedigital-preservationspace-technology

Feb 14

#620: When ZFS Pools Survive Hardware Death

Your motherboard fried, but is your data safe? Discover the secrets of ZFS portability, forced imports, and professional recovery workflows.

data-integrityfault-tolerancedata-storage

Feb 12

#591: When Electrons Teleport: The Physics Limit of Storage

From floppy disks to 4TB cards, how much data can we squeeze onto a fingernail before physics pushes back? Explore the future of storage density.

data-storagehardware-engineeringsemiconductors

Feb 12

#589: Taming the Digital Landfill: Version Control for AI Media

When AI agents and 4K video crash your repo, it’s time for better tools. Explore why Git fails and how Perforce and DVC save the day.

software-developmentdata-storagetraining-datainfrastructureversion-control

Feb 10

#564: Beyond the Factory Reset: How to Truly Erase Your Data

Think a factory reset protects your old data? Herman and Corn reveal why your digital "ghosts" might still be lurking on your old devices.

data-storagedata-securityprivacye-wastedigital-forensics

Feb 1

#409: When RAID Fails: The Rebuild Time Nightmare

Learn the math behind RAID levels, the risks of drive rebuilds, and why ZFS is the modern gold standard for data integrity.

data-storagefault-tolerancedata-integrity

#3431: How YouTube Stores 500 Hours of Video Every Minute

#3217: When a Truck Beats the Internet: Shipping Data at Scale

#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage

#2685: Plugin Data Storage for AI Agents

#2571: How S3 Billing Actually Works (And Why R2 Is Different)

#2475: Docker Volumes: Why They Can't Move and What To Do

#2465: JSON-L vs Parquet: When Each Format Wins

#2438: The Folder Illusion: How Object Storage Fakes Hierarchy

#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations

#2271: Vector Search in a Single File

#2064: Why GPT-5 Is Stuck: The Data Wall Explained

#2011: Saving AI Knowledge Beyond the Chat Window

#2010: Building Better AI Memory Systems

#1989: Your Cloud Photos Vanish If You Miss a $5 Bill

#1988: The Eternal Storage That Can't Escape the Lab

#1983: Why Your Digital Photos Are Slowly Disappearing

#1920: InfluxDB vs. Postgres: The Time-Series Showdown

#1910: Our Podcast Is Now a Permanent Research Artifact

#1797: Why the Cloud Runs on Cassette Tapes

#1776: The Sync Trap: Why Your Backup Isn't Safe

#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model

#1233: Why "Just Use Postgres" Isn't Always Enough

#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j

#1124: The Database Explosion: Why One Size No Longer Fits All

#1044: Ezra the Scribe: Architect of a Portable Identity

#742: The Dark Archive: Saving Extremism for History

#714: The Billion-Year Backup: Escaping the Digital Dark Age

#620: When ZFS Pools Survive Hardware Death

#591: When Electrons Teleport: The Physics Limit of Storage

#589: Taming the Digital Landfill: Version Control for AI Media

#564: Beyond the Factory Reset: How to Truly Erase Your Data

#409: When RAID Fails: The Rebuild Time Nightmare

Related Topics