You ever zoom into a digital photo of a clear blue sky and notice those weird, blocky squares where everything should be smooth? It looks like a Minecraft character started melting into the horizon. Those are the ghosts of nineteenth-century mathematics haunting your smartphone, and honestly, they’re costing us a fortune in bandwidth and storage.
It is the classic struggle between visual fidelity and the cold, hard reality of data limits. Today's prompt from Daniel is about the evolution of digital image files, tracing the path from the ubiquitous JPEG to modern heavy hitters like WebP and AVIF. It is a journey through psychovisual models and some honestly brilliant engineering.
Well, I’m glad we’re tackling this because my cloud storage bill is starting to look like a mortgage payment. By the way, quick heads-up for the listeners—today’s episode is powered by Google Gemini 3 Flash. It’s the brain behind the curtain for this specific deep dive. So, Herman Poppleberry, why are we still living in a world of blocky artifacts in twenty twenty-six? Haven't we solved pictures yet?
We’ve solved them, but the "how" is constantly shifting. When we talk about image compression, we aren't just talking about making a file smaller. We are talking about a sophisticated balancing act involving three variables: encoding speed, decoding speed, and visual quality. JPEG was the king for thirty years because it hit a sweet spot that worked on the hardware of the nineties, but our displays have outpaced the math.
It’s wild to think JPEG is from nineteen ninety-two. That’s the same year the first text message was sent. We’ve gone from "u up?" to high-definition video calls, yet the primary way we store memories is still based on tech from the era of dial-up modems and floppy disks.
It’s a testament to how good the original JPEG standard actually was. It’s based on something called the Discrete Cosine Transform, or DCT. To understand why your sky looks blocky, you have to understand that JPEG doesn't see an image as a whole. It breaks the entire picture into eight-by-eight pixel blocks.
Eight by eight? That seems tiny. Why such a specific, rigid number? Why not just look at the whole image at once?
Because in nineteen ninety-two, the processors we had—like the Intel 486—would have literally caught fire trying to calculate the frequencies of a million-pixel image all at once. By breaking it into sixty-four-pixel chunks, the math becomes manageable. It’s a "divide and conquer" strategy. Within those sixty-four pixels, the algorithm is performing a mathematical ritual to decide what your eyes can actually see. It converts those spatial pixels into frequency components. Think of it like a musical chord. A complex image has high-frequency "notes"—sharp edges, fine textures—and low-frequency "notes"—broad washes of color.
And since humans are famously bad at seeing fine detail in high-frequency areas, the algorithm just... tosses them in the bin?
Well, not exactly—sorry, I shouldn't say that. It quantizes them. It reduces the precision of those high-frequency components. If you have a slider in your photo editor that says "Quality: seventy percent," what you’re actually doing is telling the quantization table to be more aggressive about rounding those numbers down to zero. When you push it too far, the math can no longer bridge the gap between those eight-by-eight blocks, and that is where the "blocking artifacts" come from. The edges of the math become visible to the naked eye.
It’s like a painter who only has a certain amount of paint, so they spend all of it on the person’s face and then just use a big, dirty sponge for the background. It looks fine from across the room, but if you get close, you see the sponge marks.
That’s a perfect analogy. And for thirty years, we just accepted the sponge marks because the alternative was a file so large it would take five minutes to download. But it’s the ultimate "fake it till you make it" strategy. It worked! It’s the reason the early web didn't take forty minutes to load a single headshot. But we’ve moved past the "Holy Trinity" of JPEG, PNG, and GIF now, right?
We definitely have, though it took a surprisingly long time to dethrone the king. For a long time, if you wanted transparency, you had to use PNG, which is lossless but produces massive files. If you wanted animation, you used GIF, which is limited to two hundred fifty-six colors and is essentially a digital fossil at this point. Then, in twenty-ten, Google entered the chat with WebP.
Ah, WebP. The format that every time I try to "Save Image As" from a browser, I get annoyed because my local image viewer from twenty-fifteen doesn't know what to do with it.
You’ve got to update your viewer, Corn! But your frustration is actually a sign of its success. WebP was a brilliant bit of recycling. Google took the intra-frame coding technology from the VP8 video codec and wrapped it into an image format. It brought a concept called "predictive coding" to the table.
Predictive coding sounds like something an AI would do. Is the file trying to guess what I’m looking at?
In a way, yes. Instead of just looking at an eight-by-eight block in isolation like JPEG, WebP looks at the blocks it has already encoded and tries to predict what the next block will look like. It says, "Hey, the last three blocks were blue sky, I bet this one is too." Then, it only stores the "residual"—the difference between its guess and the actual pixels.
So it’s like if I’m giving you directions. Instead of saying "Turn left on Main Street, go three blocks, then turn left on Oak," I just say "Do what you did last time, but go one block further."
It’s significantly more efficient than just throwing data away. It’s like the difference between someone telling you a whole story and someone just giving you the updates since the last time you talked.
That’s a great way to put it. Because of that predictive jump, WebP can get files twenty-five to thirty-five percent smaller than a JPEG of the same visual quality. And it supports transparency like a PNG and animation like a GIF. It was the first real "Swiss Army Knife" of image formats.
So why did it take a decade for it to become the standard? I feel like I only started seeing WebP everywhere in the last five or six years.
Browser wars, mostly. Safari was the big holdout. Apple didn't add WebP support until iOS fourteen and macOS Big Sur back in late twenty-twenty. Until then, web developers had to serve JPEGs to iPhone users and WebP to Chrome users. Once Apple blinked, the floodgates opened.
It’s always Safari, isn't it? The speed bump on the highway of progress. But even as WebP was taking over, something else was lurking in the shadows. You mentioned AVIF earlier. If WebP is the Swiss Army Knife, what is AVIF? A laser-guided scalpel?
Pretty much. AVIF is the AV1 Image File Format, and it’s derived from the AV1 video codec—which is the current state-of-the-art for video. If WebP was a twenty-five percent improvement over JPEG, AVIF is a fifty percent improvement. We are talking about taking a two-megabyte high-res photo and crushing it down to three hundred kilobytes without your eyes ever knowing the difference.
Fifty percent? That’s massive. If I’m a developer running a site with thousands of images, cutting my payload in half is the difference between a snappy user experience and someone bouncing because the page took three seconds to load on a 4G connection.
It’s a game changer for mobile performance. We actually saw a case study recently from a major e-commerce platform. They switched their entire catalog to AVIF with a WebP fallback. They saw a thirty percent reduction in total image weight across the site, which translated to a one-point-two-second improvement in "Largest Contentful Paint" on mobile devices. In the world of e-commerce, a one-second faster load time can literally mean millions of dollars in recovered conversions.
I love that we’re at a point where "better math" results in "more money." But what’s the catch? There’s always a catch. If AVIF is so much better, why haven't we deleted every JPEG on the planet? Is it just compatibility?
Compatibility is part of it, but the real catch is computational cost. AVIF is incredibly complex to encode. It’s using advanced tools like "chroma-from-luma" prediction and more complex block partitioning—it doesn't just stick to eight-by-eight; it can go up to sixty-four-by-sixty-four or down to four-by-four. It takes a lot of CPU cycles to figure out those fifty-percent savings. If you’re a photographer trying to export a thousand wedding photos, saving them as AVIF might take five times longer than saving them as JPEG.
So, it’s a trade-off between the server’s time and the user’s time. If I’m Instagram, I’m happy to spend the extra processing power once to save the bandwidth for a billion users. But if I’m just a guy on a laptop, I might stick to what’s fast.
Mostly. But as of twenty-six, hardware acceleration for AV1 is becoming standard in most phone chips and PC processors. So that "cost" is shrinking every day. However, there is another player in the room that photographers actually care about more than AVIF, and that is JPEG XL.
JPEG XL? That sounds like the "Extra Large" version of the old format. Is it just JPEG with more pixels?
No, the "XL" actually stands for "Long-term." It’s a completely new architecture, and it’s fascinating because it solves a problem that WebP and AVIF actually ignore: legacy migration.
Talk to me about legacy migration. Are we talking about converting my old vacation photos? Because I have a hard drive from two thousand eight that is basically a graveyard of low-res JPEGs.
Yes, but in a way that feels like magic. JPEG XL has a feature called "lossless transcoding." You can take an existing JPEG—one of those old, blocky ones from ten years ago—and convert it to a JPEG XL file. It will be twenty percent smaller, but it is mathematically identical to the original. You aren't "re-compressing" it and losing more quality; you’re just repacking the data more efficiently. And here’s the kicker: you can convert it back to the original JPEG at any time, bit-for-bit.
Wait, that’s huge. So I can save twenty percent of my storage space today, and if for some reason the world decides to go back to nineteen ninety-two, my files are still "original"? It’s like a zip file that stays an image.
Precisely. Well—there I go again with the "P" word. Let's just say, that is exactly why the professional archiving and photography communities are obsessed with it. JPEG XL also handles ultra-high resolutions—we're talking up to a billion pixels—and it supports high bit depths like ten, twelve, or even thirty-two-bit float for HDR. AVIF is great for the web, but JPEG XL is looking like the future of high-end digital imaging.
But wait, if JPEG XL is so good, why did I read that it was struggling? I remember hearing that Google Chrome actually removed support for JPEG XL a few years back. It felt like they were trying to kill it off to favor their own formats like WebP or AVIF.
They did. In twenty-twenty-two, they pulled the experimental support, claiming there wasn't enough interest, which caused a huge outcry in the developer community. It was a very controversial move because JPEG XL is actually faster to decode than AVIF and has better features for high-end photography. But as of the last year or so, we’ve seen a massive resurgence. Professional tools like Adobe Photoshop and the major camera manufacturers are starting to bake it in. It’s a classic battle between "efficiency for the web" versus "quality for the creator."
It feels like the "Image Wars." We’ve got AVIF in the red corner, backed by the web performance crowd, and JPEG XL in the blue corner, backed by the photographers. Who wins? Or do we just end up in a world where we need twelve different codecs to open a meme? Is there a risk of "format exhaustion" where we just give up and go back to screenshots?
The winner, honestly, is the "Image CDN." If you’re a user browsing the web in twenty-twenty-six, you probably don't even know what format you’re looking at. Services like Cloudinary or Akamai use something called "Content Negotiation." When your browser requests an image, the server looks at your "Accept" header. It says, "Oh, Corn is using a modern browser that supports AVIF? Cool, I’ll send him the tiny AVIF version." If you’re on some ancient tablet that only speaks JPEG, it sends you the old-school version.
So the file extension on the end of the URL is basically a lie. I could be looking at a ".jpg" that is actually an AVIF under the hood?
It’s a suggestion. The server is doing the heavy lifting behind the scenes. And this is where it gets really interesting when we look at the "second-order effects" Daniel’s prompt hints at. This isn't just about making pages load faster. It’s about sustainability.
Sustainability? I didn't think my cat photos were killing the planet, Herman. Are we really saying a smaller file size saves the polar bears?
Think about the scale, though. Images make up roughly sixty to sixty-five percent of the total data transferred on the internet. Every time you scroll through a social media feed, you’re downloading hundreds of megabytes of data. If we can cut that data in half globally by moving from JPEG to AVIF or JPEG XL, we are talking about a massive reduction in the electricity required to power data centers and transmission lines. Smaller files mean less cooling, less bandwidth, and less hardware. Efficiency is a "green" technology. It’s estimated that moving the entire web to modern image formats could save as much energy as taking thousands of cars off the road.
That’s a perspective I hadn't considered. Every time I use a more efficient codec, I’m technically saving a tiny bit of the power grid. I feel more virtuous already. But let’s get practical for a second. If I’m a designer or a developer listening to this, what should I actually be doing? Should I go out and convert every PNG on my site to an AVIF right now?
Not manually. That’s a recipe for a headache. The first takeaway is to use the HTML "picture" element. It’s built for this. You can list your AVIF file as the primary source, then a WebP file as the second source, and finally a JPEG as the fallback. The browser will automatically pick the best one it can handle. It’s the "set it and forget it" of web development.
And what about the quality settings? I’ve always been told "save at eighty percent and you’re fine." Does that rule still apply to these new formats? Or do we need a new rule of thumb?
Not really. The "quality" scale in AVIF isn't the same as JPEG. An AVIF at "thirty" might look better than a JPEG at "eighty." My advice is to use tools like Squoosh—which is a web app Google put out—to actually do a side-by-side comparison. You can slide a divider across the image and see exactly where the artifacts start to appear. Most people find they can push AVIF much lower than they think, saving a ton of space without any visible degradation.
I’ve used Squoosh! It’s actually quite fun. It’s like a magnifying glass for math errors. But what about the AI side of this? You mentioned earlier that AI is starting to get involved in how we compress things. Is the future just an AI looking at my photo and saying, "I know what a tree looks like, I’ll just delete the tree and recreate it on the user’s end"?
We are actually seeing that. There is a concept called "Neural Compression." Instead of using hand-written mathematical formulas like DCT, we use a neural network. The encoder "learns" a compact representation of the image, and the decoder—on your phone—uses another neural network to "reconstruct" it. It’s similar to how DLSS works in video games, where the computer renders at a low resolution and AI "fills in" the missing detail.
That sounds dangerous. What if the AI thinks my dog is a cat and "reconstructs" him with pointy ears? Or what if it decides my uncle needs a better hairline in the family reunion photo?
That is the "hallucination" risk. If the compression is too aggressive, the AI might fill in details that weren't there. It might turn a grainy texture into a smooth one or change the shape of a distant face. It’s why neural compression is currently mostly used for things like video calls where a little bit of "faking it" is okay for the sake of fluid motion. For archiving your family photos, we’re still sticking to the "deterministic" math of things like JPEG XL.
Good. I don't want my grandkids looking at photos of me and wondering why I have six fingers because the compression algorithm thought it looked "more natural." So, looking ahead, where does this go? Are we going to see a "JPEG XXL" in five years? Or is the evolution starting to plateau?
I think we’re reaching the limits of "lossy" compression where the gains are becoming marginal to the human eye. The next frontier is "semantic compression."
Semantic compression. Explain that to me like I’m a sloth who’s only had one cup of coffee.
It’s compression based on meaning. Imagine a photo of you sitting in a park. A semantic compressor identifies "Corn," "bench," "grass," and "trees." It realizes that the grass doesn't need to be perfectly sharp for the image to work. It compresses the grass heavily but keeps the textures of your face perfectly clear. It’s a "content-aware" approach that prioritizes what humans actually care about in a scene.
So it’s basically an automated version of what a photographer does when they use a shallow depth of field. Keep the subject sharp, let the rest be a blur of efficient data. Does it know to prioritize things like text or faces?
It uses saliency maps to determine which parts of the image will draw the human eye first. And when you combine that with the HDR capabilities of something like AVIF or JPEG XL, the results are stunning. We’re moving away from "flat" images to ones that can actually represent the true brightness of the sun or the deep shadows of a forest. JPEG was an eight-bit format. It could only represent two hundred fifty-six levels of brightness. Modern formats go way beyond that, supporting ten-bit or twelve-bit color, which means billions of colors instead of millions.
It’s funny how we’ve spent thirty years trying to make digital images look as good as film, and now that we’ve finally surpassed it, we’re spending all our energy trying to make those files small enough to send to someone in half a second.
It’s the cycle of tech. More power leads to more data, which leads to a need for better compression, which frees up more power. But the real takeaway for everyone listening is that the "defaults" are changing. If you’re still using JPEG for everything, you’re basically leaving money—and battery life—on the table.
I wonder if we’ll ever reach a point where the file format is just... "Image." Like, the computer just handles whatever math is necessary and we never have to see a three-letter extension again.
We're getting closer. With containers like HEIF—which is what iPhones use—you can actually store multiple versions of an image, or even an image and a video, in one file. The "extension" is becoming less about the math and more about the container.
Well, I’m going to go audit my website and see how many "legacy" JPEGs I can hunt down. It’s time to move into the twenty-twenties, even if it’s twenty-six already. Herman, this has been an eye-opener. I’ll never look at a blocky sky the same way again. I'll just see it as a cry for help from a nineteen ninety-two algorithm.
Just remember,
those blocks are just the math saying "I give up." With AVIF and JPEG XL, the math never has to give up. It’s got a lot more tools in the toolbox now.
A beautiful sentiment to end on. If you’ve enjoyed this deep dive into the pixels, do us a favor and leave a review on your favorite podcast app. It really helps other nerds find the show.
Big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.
And a huge shout-out to Modal for providing the GPU credits that power this whole operation. We couldn't do it without them.
This has been My Weird Prompts. You can find us at myweirdprompts dot com for all the links and ways to subscribe.
Catch you in the next one. Stay sharp—or at least, stay efficiently compressed.
See ya.