#1943: Why Tar Isn't Compression (And What Is)

LZMA, Zstandard, and Brotli are shrinking massive AI models, but how do they actually work?

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2099
Published: Apr 3
Duration: 21:44
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: data-integrity software-development high-performance-computing

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Unseen Engine of the Modern Internet

Every time you download a large file, stream a video, or pull the latest AI model, you’re relying on decades of mathematical innovation designed to shrink data as small as possible. While most users simply right-click and hit "compress," the difference between legacy tools and modern algorithms is the difference between a minor trim and a total reconstruction of the data.

The Misunderstood Tape Archive

First, we need to clear up a common misconception: the tar command does not compress anything. Standing for Tape Archive, tar simply bundles multiple files into a single continuous stream, preserving folder structures and permissions without reducing size. It’s the equivalent of putting ten boxes into one giant crate—the crate is easier to move, but it isn’t smaller. The magic happens when you apply a compressor to that tarball, creating formats like .tar.gz. Compressing a single tar stream is far more efficient than compressing files individually because the algorithm can identify patterns across the entire batch, a technique known as "solid compression."

LZMA2: The King of Ratios

When absolute smallest file size is the goal, LZMA2 (used in 7-Zip) is the heavyweight champion. Its secret weapon is dictionary size. Compression algorithms work by remembering patterns they’ve seen before; the larger the dictionary, the further back the algorithm can look for repeating strings.

Standard Gzip uses the Deflate algorithm with a tiny 32KB sliding window. It’s like trying to memorize a book while only remembering the last two pages. If a pattern repeats three pages back, Gzip treats it as new data. LZMA2, by contrast, can utilize a dictionary of up to one gigabyte. It can remember a pattern seen hundreds of megabytes earlier, making it ideal for massive software repositories or datasets. The trade-off is significant resource usage; LZMA2 compression is slow and RAM-heavy, trading CPU cycles today for massive bandwidth and storage savings forever.

Zstandard: The Speed Demon

Zstandard (zstd), designed by Yann Collet at Meta, aims to solve the speed-versus-ratio dilemma. It seeks to deliver LZMA-level compression ratios with Gzip-level speeds. It achieves this through Finite State Entropy (FSE), a coding method that provides the efficiency of complex arithmetic coding with the speed of simpler Huffman coding.

Zstandard is incredibly versatile, offering compression levels from 1 to 22. At lower levels, it’s faster than Gzip and compresses better; at higher levels, it rivals LZMA in size. This flexibility has made it the gold standard for real-time log compression, Linux package distribution, and increasingly, AI model weights. Its decompression speed is lightning-fast and highly parallel, making it perfect for the "cold start" problem in serverless AI, where downloading and decompressing model weights is a major bottleneck.

Brotli: The Web Specialist

Developed by Google, Brotli is optimized specifically for web content like HTML, JavaScript, and CSS. Its clever trick is a static dictionary pre-loaded with common web strings—things like "DOCTYPE html" or standard English words. Unlike other compressors that must "learn" the data from scratch, Brotli has a head start. For small web files, this static dictionary allows it to compress 20-30% better than Gzip.

The AI Connection

The recent surge in interest in these algorithms is heavily driven by AI model distribution. Even after using "lossy" compression techniques like quantization (e.g., 4-bit GGUF models), the resulting files are still massive. Wrapping these quantized models in a lossless format like Zstandard can shave another 10-15% off the file size. For companies distributing models to millions of users, this saves petabytes of bandwidth.

Furthermore, advanced features like "dictionary training" allow Zstandard to be tailored to specific data types. By training the algorithm on a sample of data—like millions of similar JSON objects or specific AI weight distributions—companies can create custom dictionaries that make compression even more efficient for their unique use case.

As we move toward a future of massive data transfer and instant AI inference, the humble compression algorithm is more critical than ever. Zstandard is rapidly becoming the default choice for its balance of speed and size, but LZMA2 and Brotli remain essential tools for their specific niches. The invisible math of shrinking files is what keeps the internet flowing smoothly, one gigabyte at a time.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1943: Why Tar Isn't Compression (And What Is)

You ever have that moment where you’re looking at a download bar for a hundred gigabyte AI model and thinking, there has to be a better way to live? I was staring at a progress bar yesterday and it hit me just how much we take for granted the invisible math that’s actually shrinking these files so they don’t break the entire internet.

It is the unsung hero of the modern stack, Corn. Honestly, without the advancements we’ve seen in the last decade from things like Zstandard and LZMA2, the distribution of large language models would be a nightmare. We’d be moving data at a glacial pace. By the way, Herman Poppleberry here, and just a quick note before we dive into the weeds—today’s episode is actually powered by Google Gemini 3 Flash.

Oh, nice. A little AI helping us talk about how to shrink AI. Circle of life. And today’s prompt from Daniel is right in that wheelhouse. He wants us to dig into ultra-efficient compression—specifically LZMA, Zstandard, and Brotli. He’s curious about how these things actually achieve those massive ratios compared to the old guard, and why we’re seeing such a surge in interest now that everyone is trying to shove seventy-billion-parameter models onto consumer hardware.

It’s a great prompt because there’s so much technical nuance that gets glossed over. Most people just right-click a folder, hit compress, and hope for the best. But the difference between using a legacy tool and a modern algorithm is the difference between a minor trim and a total reconstruction of the data.

Well, before we get into the heavy math, we have to address the elephant in the room—or the tape in the room. I feel like we need a public service announcement about "tar." I still see people saying "I tarred the file to make it smaller." Herman, please, for the love of all that is holy, explain why that drives you crazy.

It’s a fundamental misunderstanding of the plumbing! Tar stands for Tape Archive. It comes from the days when you were literally writing data to physical magnetic tapes. All tar does is take a bunch of individual files and glue them together into one long continuous stream. It preserves your folder structure and your permissions, but it doesn't move a single bit out of place to save space. If you have a gigabyte of text files and you tar them, you still have a gigabyte of data, just in one wrapper.

Right, it’s like putting ten boxes into one giant crate. The crate is easier to move, but it’s not smaller than the sum of the boxes.

Wait, I promised I wouldn't say that word. You're right. The magic happens when you follow that up with a compressor. That’s why you see dot tar dot gz. The tar utility bundles them, and then Gzip—the compressor—goes to work on that single stream. And here’s the "insider" tip: compressing a tarball is actually more efficient than compressing files individually.

Because the compressor can see the patterns across the whole batch?

Precisely. If you have a hundred C++ files, they probably all have the same header guards and include statements. If you compress them individually, the compressor has to "learn" those patterns a hundred times. If you tar them first, the compressor sees that header once, remembers it, and then for the next ninety-nine files, it just says "refer back to that first header." We call that "solid compression," and it’s a huge part of why modern archives are so much smaller than the sum of their parts.

Okay, so tar is the glue, Gzip is the shrink-wrap. But Daniel mentioned the "modern trio"—LZMA2, Zstandard, and Brotli. Gzip feels like the old reliable station wagon, but these others are like high-performance EVs. If I’m looking for the absolute smallest file size possible, I’m usually reaching for 7-Zip, which uses LZMA or LZMA2. What is Igor Pavlov doing in that engine that makes it so much better than the standard ZIP format?

LZMA2 is basically the king of the "ratio at all costs" world. The secret sauce is the dictionary size. To understand this, you have to think about how dictionary-based compression works. The algorithm reads through your data and builds a map of strings it has seen before. Gzip, which uses the Deflate algorithm, has a tiny "sliding window" or dictionary—only thirty-two kilobytes.

Thirty-two kilobytes? That’s like trying to memorize a book but only being able to remember the last two pages at any given time.

That is a perfect way to put it. If a pattern repeats three pages back, Gzip has already forgotten it. It has to treat it as new data. LZMA2, on the other hand, can have a dictionary size of up to one gigabyte. It can remember a pattern it saw eight hundred megabytes ago. When you’re compressing a massive software repository or a giant dataset, that huge memory allows it to find redundancies that Gzip simply cannot see.

But there’s no free lunch, right? If I’m telling my computer to keep a one-gigabyte dictionary in its head while it’s crunching numbers, my RAM is going to feel that.

Oh, it’ll scream. That’s why 7-Zip can be so slow and resource-heavy during the compression phase. It’s doing an incredible amount of work to find the most optimal way to represent those patterns. It uses a variant of the LZ77 algorithm combined with something called range coding. Range coding is a form of entropy coding that’s very similar to arithmetic coding—it basically compresses the description of the data down to the theoretical limit.

It’s funny you mention the "screaming fans" because I’ve definitely sat there watching 7-Zip take twenty minutes to compress something that Gzip would have finished in thirty seconds. But then the 7-Zip file is half the size. It’s a classic trade-off: you’re trading CPU cycles and time today for lower bandwidth and storage costs forever.

And that brings us to Zstandard, or zstd, which is probably the most impressive engineering feat in this space in the last twenty years. Yann Collet at Meta designed it to solve exactly the problem you just described. He wanted the compression ratios of LZMA but the speed of Gzip.

That sounds like "I want a Ferrari that gets sixty miles per gallon." How do you actually pull that off?

You do it by rethinking how the entropy coding works. Zstandard uses something called Finite State Entropy, or FSE. Without getting too bogged down in the calculus, historically you had two choices: Huffman coding, which is fast but not very efficient, or Arithmetic coding, which is very efficient but incredibly slow because it requires complex multiplications and divisions for every single bit. FSE is a "taming" of that math. It gives you the efficiency of arithmetic coding but uses a state-machine approach that’s almost as fast as Huffman.

So it’s basically a math shortcut that gets you ninety-five percent of the way to perfection with ten percent of the effort.

That’s the core of it. And Zstandard is incredibly flexible. It has compression levels from one to twenty-two. At level one, it’s faster than Gzip and usually compresses better. At level twenty-two, it starts to rival LZMA for size, though it slows down significantly. It’s become the gold standard for things like real-time log compression or distributing Linux packages because it’s so versatile.

I’ve noticed a lot of the AI crowd moving toward zstd for model weights. When you’re shipping a Safetensors file or a GGUF, saving even five or ten percent on a hundred-gigabyte file is massive. But I also want to ask about Brotli, because that’s the third one Daniel mentioned. I usually hear about Brotli in the context of web browsers. Is it just for HTML?

It’s optimized for text, specifically web content like HTML, JavaScript, and CSS. Google developed it with a very clever trick: a static dictionary. Most compressors start with a "blank brain" and have to learn the data as they go. Brotli comes pre-installed with a dictionary of common web strings—things like "DOCTYPE html" or "javascript" or common English words.

So it doesn't have to waste space "learning" that the word "function" exists in a JavaScript file. It already knows the code for "function."

Well, there I go again. It has a head start. For a small file like a website’s homepage, that static dictionary makes a huge difference because the compressor doesn't have enough data to build a good dynamic dictionary on its own. In those cases, Brotli can be twenty to thirty percent better than Gzip.

It’s wild how specific these tools have become. You’ve got LZMA for the heavy archives, Zstandard for the "all-rounder" speed demon, and Brotli for the web. But let's look at the "why" behind this surge in interest. Daniel pointed out that AI model distribution is a huge driver. I was looking into the "cold start" problem in serverless AI—you know, when a cloud provider has to spin up a GPU instance and download the model weights before it can answer your prompt.

That is the bottleneck of the decade. If you’re running a serverless function, the actual inference—the AI thinking—might take two seconds. But if it takes thirty seconds to download the model from an S3 bucket and decompress it, the user experience is terrible. This is where Zstandard is winning, because its decompression speed is lightning fast.

Right, because LZMA might give you a smaller file, but if your CPU spends two minutes decompressing it, you’ve lost all the time you saved on the download. Zstandard is designed for massive parallelism. You can decompress it across multiple CPU cores almost instantly.

And we should clarify the difference between what we’re talking about—lossless compression—and "quantization," which is the other big buzzword in AI. When people talk about "4-bit GGUF" models, they’re talking about lossy compression. They’re actually changing the weights of the model, rounding numbers off to save space. It’s like a JPEG for AI.

Right, you’re losing a bit of "intelligence" to gain a lot of space. But even after you quantize a model down to four bits, you still wrap it in something like Zstandard for the actual trip across the internet.

You have to. Even a quantized model has patterns. If you have a file full of 4-bit integers, there’s still redundancy. Zstandard can shave another ten or fifteen percent off that already-shrunken file. When you’re Meta and you’re distributing Llama models to millions of developers, that’s petabytes of bandwidth saved.

I saw a report from Cloudflare recently—I think Daniel might have mentioned this too—where they found Zstandard was forty-two percent faster than Brotli for certain types of web traffic while keeping the same compression ratio. It feels like we’re seeing a bit of a "format war," but one where Zstandard is slowly eating everything.

It’s hard to beat the versatility. One clever feature of Zstd that I love is "dictionary training." If you have a specific type of data—let’s say you’re a company that stores millions of small JSON objects that all look similar—you can "train" Zstandard on a sample of those objects. It creates a custom dictionary that you distribute to both the sender and the receiver. Then, you can compress those tiny JSON objects individually as if they were part of one giant file.

That’s basically what Brotli does for the web, but Zstandard lets you do it for anything. You could have a custom dictionary for genomic data, or for financial transactions, or for specific AI weight distributions.

It’s the next frontier of efficiency. In fact, there’s a new standard being pushed called "Compression Dictionary Transport" for browsers. Imagine if your browser stored a dictionary for a specific site you visit often. When the site updates its code, the server doesn't send the new file—it just sends the "delta," the tiny differences, and your browser uses its stored dictionary to reconstruct the new version. We’re talking about ninety percent reductions in payload size for returning users.

It’s like having a secret codebook between you and the server. I love that. It makes the internet feel a lot more personal and efficient. But Herman, let's talk about the downside. There’s a reason Gzip is still everywhere. If I’m a developer, why wouldn't I just use Zstandard for everything starting today? Is it just compatibility?

Mostly, yes. Gzip is the "English language" of the internet. Every browser, every operating system, every tiny embedded microcontroller understands Gzip. If you use Zstandard, you have to make sure the other end has the library installed to decompress it. But that’s changing fast. The Linux kernel uses Zstd now. Android uses it. Most modern browsers have added support for it in the last year or two. We’re reaching the tipping point where "the Gzip tax"—the extra money you pay for storage and bandwidth because you’re using an old algorithm—is becoming harder to justify.

"The Gzip tax." I like that. It’s like paying for a premium cable package when you only watch one channel. You’re paying for bits that don't need to exist. I want to go back to the 7-Zip thing for a second, though. Why is it that when I use 7-Zip on my laptop, it feels like the computer is about to take off? You mentioned the dictionary size, but is there more to it?

It’s the "match finding" process. To find a pattern that repeated five hundred megabytes ago, the algorithm has to do a lot of searching. It uses complex data structures like hash chains or binary trees to keep track of where different strings have appeared. Searching through those structures while you’re also trying to encode the data is incredibly CPU-intensive. LZMA2 improved on the original LZMA by allowing for better multi-threading, but it’s still fundamentally an "exhausting" algorithm. It’s trying to find the absolute best mathematical representation, whereas Zstandard is looking for a "good enough" representation that it can find quickly.

It’s the difference between a master painter spending six months on a portrait and a high-end digital camera taking a photo in a fraction of a second. Both look great, but one is much more practical for daily use.

That’s a fair analogy. And for archival storage—like if you’re backing up your family photos or old project files to a cold-storage drive—LZMA2 is still the winner. If it takes an hour to compress but saves you twenty gigabytes of space on a drive that’s going to sit in a drawer for five years, that’s a win. But for the "living" internet, speed is king.

I wonder about the future of this. We’re seeing more hardware acceleration now. Intel has their QuickAssist Technology, or QAT, which is basically a dedicated slice of the CPU just for compression and encryption. Does that change the game? If the hardware can do the heavy lifting, do we stop caring about the complexity of the algorithm?

It definitely shifts the math. If you have a dedicated chip that can do LZMA-level compression at Gzip speeds, then the software trade-offs become less relevant. But we’re also seeing a move toward specialized compression for neural networks. Things like "Weight-Only Quantization" combined with entropy coding that understands the distribution of numbers in a transformer model.

Right, because weights in an AI model aren't random. They usually follow a bell curve or some other predictable distribution. If the compressor knows that, it can be even more efficient.

We’re moving away from "general purpose" compression and toward "context-aware" compression. It’s why Brotli is so good at text and why the future of AI distribution will likely involve compressors that "understand" the structure of a neural network.

It’s fascinating that as our data gets more complex, the math we use to hide that complexity has to get just as sophisticated. I mean, we’re talking about algorithms that are essentially trying to solve the problem of "how much information is actually in this pile of bits?"

It’s Claude Shannon’s information theory coming to life. Every file has an "entropy limit"—the absolute minimum number of bits required to represent that information. We’re just getting closer and closer to that limit. Gzip was a great first step, but LZMA and Zstandard are pushing us toward the theoretical edge.

Well, before we wrap up the technical deep dive, I have to ask: do you have a favorite? If you were stranded on a desert island with a one-terabyte hard drive and one compression utility, what are you picking?

It has to be Zstandard. The sheer range of it—from ultra-fast to ultra-dense—makes it the only tool you really need. Plus, the fact that it’s open-source and has such a brilliant community around it means it’s only going to get better. What about you, Corn? You’re the one who likes to poke fun at the slow progress bars.

I think I’m a 7-Zip guy at heart. There’s something satisfying about seeing a file size drop by sixty percent and knowing that my CPU worked hard for that. It feels "earned." Plus, I like the deadpan simplicity of the interface. It hasn't changed in twenty years and it doesn't need to.

Spoken like a true sloth. You value the result, even if it takes a while to get there.

Hey, efficiency comes in many forms. Sometimes being efficient means taking the time to do it right once so you never have to worry about it again. But let’s move into some practical takeaways for the folks listening who maybe aren't building their own AI models but still want to stop paying "the Gzip tax."

The biggest one is simple: if you’re a developer or a sysadmin still using Gzip for your backups or your internal data transfers, stop. Switch to Zstandard. Most modern tools—like the "tar" command itself—now have built-in support for it. Instead of typing "tar dash czvf," you just use the zstd flag. You’ll get faster speeds and smaller files almost instantly.

And for the average user, if you’re sending a big batch of files to someone, don't just use the default "Compress" option in Windows or macOS. Download a tool like 7-Zip or NanaZip. Use LZMA2 if you’re sending it over a slow connection, or Zstandard if you just want it done quickly. The difference in file size can be the difference between an email attachment bouncing or going through.

Also, keep an eye on your browser’s "Network" tab in the developer tools. If you see "br" in the content-encoding header, that’s Brotli. If you’re a web developer and you haven't enabled Brotli on your server, you’re literally making your users wait longer for no reason. It’s a five-minute config change that can improve your site’s performance by twenty percent.

I love that. It’s one of those rare "win-win" scenarios. The server does less work, the user waits less time, and the environment benefits because we’re spinning fewer disks and burning less electricity to move those bits.

It really is. And for the AI enthusiasts out there, pay attention to the compression formats used in libraries like Hugging Face’s "safetensors." We’re seeing a real push toward making models "streamable," where you can start running the model while it’s still decompressing in the background. That’s only possible with modern, block-based compression like Zstd.

It’s a brave new world of tiny files. Honestly, after talking this through, I feel a lot better about that hundred-gigabyte download. At least I know someone spent a lot of time making sure it wasn't a hundred and fifty gigabytes.

Every bit counts, Corn. Every bit counts.

Well, I think we’ve squeezed about as much as we can out of this topic—pun absolutely intended, and I’m not even sorry.

I walked right into that one.

You really did. But seriously, this has been a great deep dive. Thanks to Daniel for the prompt—it’s one of those topics that’s so foundational but so invisible that it’s easy to overlook until you really look at the math.

It’s the plumbing of the internet. And as long as people keep making bigger and bigger models, we’re going to need bigger and better wrenches to keep the data flowing.

Before we sign off, huge thanks to our producer, Hilbert Flumingtop, for keeping the show running smoothly. And a big thanks to Modal for providing the GPU credits that power our research and the generation of this very script.

If you’re enjoying "My Weird Prompts," we’d love it if you could leave us a review on Apple Podcasts or Spotify. It’s the single best way to help other curious nerds find the show.

You can also find us at myweirdprompts dot com for our full archive and the RSS feed. We’ve got over eighteen hundred episodes in there, so if you’re looking for a deep dive on anything from file systems to cassette tapes, we’ve probably covered it.

This has been My Weird Prompts. We’ll be back next time with another deep dive into whatever's on Daniel's mind.

Stay curious, and maybe try 7-Zip today. Your hard drive will thank you.

Goodbye, everyone.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1943: Why Tar Isn't Compression (And What Is)

Downloads

You Might Also Like

#1943: Why Tar Isn't Compression (And What Is)