#1904: JPEG XL vs AVIF: The Future of Your Photos

Why are blocky sky artifacts still haunting your photos in 2026? We break down the math behind JPEG, WebP, AVIF, and the new JPEG XL.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2060
Published: Apr 2
Duration: 21:39
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Gemini 3 Flash
Topics: image-generation audio-processing hardware-engineering

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Ghosts in the Machine

If you zoom into a digital photo of a clear blue sky, you might notice something unsettling: blocky squares where smooth gradients should be. These artifacts are the lingering ghosts of 19th-century mathematics, embedded in the very fabric of our digital memories. While we’ve moved from dial-up modems to gigabit fiber, the primary way we store images remains surprisingly rooted in technology from 1992. The story of digital image compression is a constant battle between visual fidelity and the cold, hard reality of data limits.

The JPEG Legacy
The JPEG standard, established in 1992, remains the undisputed king of image formats, not because it's perfect, but because it was brilliantly engineered for the hardware of its time. The core technology is the Discrete Cosine Transform (DCT), which breaks an image into 8x8 pixel blocks. This "divide and conquer" strategy was essential because processors like the Intel 486 would have literally overheated trying to calculate the frequencies of a million-pixel image all at once.

Within those 64-pixel blocks, the algorithm converts spatial data into frequency components—think of it like a musical chord. High-frequency notes represent sharp edges and fine textures, while low-frequency notes are broad washes of color. Since human eyes are notoriously bad at detecting fine detail in high-frequency areas, the algorithm aggressively quantizes, or reduces the precision of, these components. When you lower the "quality" slider in a photo editor, you're telling the quantization table to be more aggressive, rounding more numbers down to zero. Push it too far, and the math can no longer bridge the gap between the 8x8 blocks, creating the visible "sponge marks" of compression.

The Rise of Modern Contenders
For decades, the "Holy Trinity" of JPEG, PNG, and GIF ruled the web. PNG offered lossless quality but produced massive files, while GIF was limited to 256 colors and essentially a digital fossil. The first real challenger emerged in 2010 with Google's WebP.

WebP was a clever act of recycling. Google took the intra-frame coding technology from the VP8 video codec and wrapped it into an image format, introducing "predictive coding." Instead of analyzing each 8x8 block in isolation like JPEG, WebP looks at previously encoded blocks to predict the next one. It only stores the "residual"—the difference between its guess and the actual pixels. This method is significantly more efficient, yielding files 25-35% smaller than a JPEG of the same visual quality. WebP also supportsed transparency like PNG and animation like GIF, making it a true "Swiss Army Knife."

However, WebP's adoption was slowed by browser compatibility, particularly Apple's Safari, which didn't support it until late 2020. Once the floodgates opened, WebP became a standard for web performance.

AVIF: The Laser-Guided Scalpel
While WebP was taking over, a more powerful format was lurking: AVIF, derived from the state-of-the-art AV1 video codec. If WebP offered a 25% improvement over JPEG, AVIF delivers a staggering 50% reduction in file size for the same visual quality. This isn't just a minor upgrade; it's a game-changer for mobile performance. A major e-commerce platform recently reported that switching their entire catalog to AVIF reduced total image weight by 30%, leading to a 1.2-second improvement in "Largest Contentful Paint" on mobile devices—a change that can translate to millions of dollars in recovered conversions.

However, AVIF comes with a significant catch: computational cost. Its advanced tools, like "chroma-from-luma" prediction and complex block partitioning, require immense CPU power to encode. While a professional platform like Instagram can afford the server-side processing time to save bandwidth for billions of users, a casual photographer might find exporting a thousand wedding photos as AVIF prohibitively slow. Fortunately, hardware acceleration for AV1 is becoming standard in modern phones and PCs, shrinking this cost every day.

JPEG XL: The Archivist's Choice
While AVIF excels at web efficiency, another format is winning the hearts of professional photographers and archivists: JPEG XL. The "XL" stands for "Long-term," and it solves a problem WebP and AVIF largely ignore: legacy migration.

JPEG XL's killer feature is "lossless transcoding." You can take an existing, blocky JPEG from a decade ago and convert it to JPEG XL. The new file will be about 20% smaller but is mathematically identical to the original. It's not a recompression that loses more data; it's simply a more efficient repackaging. Crucially, you can convert it back to the original JPEG at any time, bit-for-bit. This is a revolutionary feature for anyone managing vast archives of old photos.

Beyond archiving, JPEG XL supports ultra-high resolutions (up to a billion pixels) and high bit depths for advanced HDR. It's also faster to decode than AVIF. However, its path has been rocky. In 2022, Google Chrome controversially removed experimental support for JPEG XL, a move seen by many as an attempt to favor its own formats. Despite this, professional tools like Adobe Photoshop and major camera manufacturers are increasingly baking JPEG XL support into their workflows, highlighting the classic tension between "efficiency for the web" and "quality for the creator."

The Future of Digital Imagery
The landscape of digital image formats is more dynamic than ever. AVIF offers unparalleled efficiency for the modern web, while JPEG XL provides a future-proof, high-fidelity solution for creators and archivists. As hardware acceleration improves and browser support evolves, the choice of format will increasingly depend on the specific needs of the user—whether it's shaving milliseconds off a page load or preserving a memory for the next century.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#1904: JPEG XL vs AVIF: The Future of Your Photos

You ever zoom into a digital photo of a clear blue sky and notice those weird, blocky squares where everything should be smooth? It looks like a Minecraft character started melting into the horizon. Those are the ghosts of nineteenth-century mathematics haunting your smartphone, and honestly, they’re costing us a fortune in bandwidth and storage.

It is the classic struggle between visual fidelity and the cold, hard reality of data limits. Today's prompt from Daniel is about the evolution of digital image files, tracing the path from the ubiquitous JPEG to modern heavy hitters like WebP and AVIF. It is a journey through psychovisual models and some honestly brilliant engineering.

Well, I’m glad we’re tackling this because my cloud storage bill is starting to look like a mortgage payment. By the way, quick heads-up for the listeners—today’s episode is powered by Google Gemini 3 Flash. It’s the brain behind the curtain for this specific deep dive. So, Herman Poppleberry, why are we still living in a world of blocky artifacts in twenty twenty-six? Haven't we solved pictures yet?

We’ve solved them, but the "how" is constantly shifting. When we talk about image compression, we aren't just talking about making a file smaller. We are talking about a sophisticated balancing act involving three variables: encoding speed, decoding speed, and visual quality. JPEG was the king for thirty years because it hit a sweet spot that worked on the hardware of the nineties, but our displays have outpaced the math.

It’s wild to think JPEG is from nineteen ninety-two. That’s the same year the first text message was sent. We’ve gone from "u up?" to high-definition video calls, yet the primary way we store memories is still based on tech from the era of dial-up modems and floppy disks.

It’s a testament to how good the original JPEG standard actually was. It’s based on something called the Discrete Cosine Transform, or DCT. To understand why your sky looks blocky, you have to understand that JPEG doesn't see an image as a whole. It breaks the entire picture into eight-by-eight pixel blocks.

Eight by eight? That seems tiny. Why such a specific, rigid number? Why not just look at the whole image at once?

Because in nineteen ninety-two, the processors we had—like the Intel 486—would have literally caught fire trying to calculate the frequencies of a million-pixel image all at once. By breaking it into sixty-four-pixel chunks, the math becomes manageable. It’s a "divide and conquer" strategy. Within those sixty-four pixels, the algorithm is performing a mathematical ritual to decide what your eyes can actually see. It converts those spatial pixels into frequency components. Think of it like a musical chord. A complex image has high-frequency "notes"—sharp edges, fine textures—and low-frequency "notes"—broad washes of color.

And since humans are famously bad at seeing fine detail in high-frequency areas, the algorithm just... tosses them in the bin?

Well, not exactly—sorry, I shouldn't say that. It quantizes them. It reduces the precision of those high-frequency components. If you have a slider in your photo editor that says "Quality: seventy percent," what you’re actually doing is telling the quantization table to be more aggressive about rounding those numbers down to zero. When you push it too far, the math can no longer bridge the gap between those eight-by-eight blocks, and that is where the "blocking artifacts" come from. The edges of the math become visible to the naked eye.

It’s like a painter who only has a certain amount of paint, so they spend all of it on the person’s face and then just use a big, dirty sponge for the background. It looks fine from across the room, but if you get close, you see the sponge marks.

That’s a perfect analogy. And for thirty years, we just accepted the sponge marks because the alternative was a file so large it would take five minutes to download. But it’s the ultimate "fake it till you make it" strategy. It worked! It’s the reason the early web didn't take forty minutes to load a single headshot. But we’ve moved past the "Holy Trinity" of JPEG, PNG, and GIF now, right?

We definitely have, though it took a surprisingly long time to dethrone the king. For a long time, if you wanted transparency, you had to use PNG, which is lossless but produces massive files. If you wanted animation, you used GIF, which is limited to two hundred fifty-six colors and is essentially a digital fossil at this point. Then, in twenty-ten, Google entered the chat with WebP.

Ah, WebP. The format that every time I try to "Save Image As" from a browser, I get annoyed because my local image viewer from twenty-fifteen doesn't know what to do with it.

You’ve got to update your viewer, Corn! But your frustration is actually a sign of its success. WebP was a brilliant bit of recycling. Google took the intra-frame coding technology from the VP8 video codec and wrapped it into an image format. It brought a concept called "predictive coding" to the table.

Predictive coding sounds like something an AI would do. Is the file trying to guess what I’m looking at?

In a way, yes. Instead of just looking at an eight-by-eight block in isolation like JPEG, WebP looks at the blocks it has already encoded and tries to predict what the next block will look like. It says, "Hey, the last three blocks were blue sky, I bet this one is too." Then, it only stores the "residual"—the difference between its guess and the actual pixels.

So it’s like if I’m giving you directions. Instead of saying "Turn left on Main Street, go three blocks, then turn left on Oak," I just say "Do what you did last time, but go one block further."

It’s significantly more efficient than just throwing data away. It’s like the difference between someone telling you a whole story and someone just giving you the updates since the last time you talked.

That’s a great way to put it. Because of that predictive jump, WebP can get files twenty-five to thirty-five percent smaller than a JPEG of the same visual quality. And it supports transparency like a PNG and animation like a GIF. It was the first real "Swiss Army Knife" of image formats.

So why did it take a decade for it to become the standard? I feel like I only started seeing WebP everywhere in the last five or six years.

Browser wars, mostly. Safari was the big holdout. Apple didn't add WebP support until iOS fourteen and macOS Big Sur back in late twenty-twenty. Until then, web developers had to serve JPEGs to iPhone users and WebP to Chrome users. Once Apple blinked, the floodgates opened.

It’s always Safari, isn't it? The speed bump on the highway of progress. But even as WebP was taking over, something else was lurking in the shadows. You mentioned AVIF earlier. If WebP is the Swiss Army Knife, what is AVIF? A laser-guided scalpel?

Pretty much. AVIF is the AV1 Image File Format, and it’s derived from the AV1 video codec—which is the current state-of-the-art for video. If WebP was a twenty-five percent improvement over JPEG, AVIF is a fifty percent improvement. We are talking about taking a two-megabyte high-res photo and crushing it down to three hundred kilobytes without your eyes ever knowing the difference.

Fifty percent? That’s massive. If I’m a developer running a site with thousands of images, cutting my payload in half is the difference between a snappy user experience and someone bouncing because the page took three seconds to load on a 4G connection.

It’s a game changer for mobile performance. We actually saw a case study recently from a major e-commerce platform. They switched their entire catalog to AVIF with a WebP fallback. They saw a thirty percent reduction in total image weight across the site, which translated to a one-point-two-second improvement in "Largest Contentful Paint" on mobile devices. In the world of e-commerce, a one-second faster load time can literally mean millions of dollars in recovered conversions.

I love that we’re at a point where "better math" results in "more money." But what’s the catch? There’s always a catch. If AVIF is so much better, why haven't we deleted every JPEG on the planet? Is it just compatibility?

Compatibility is part of it, but the real catch is computational cost. AVIF is incredibly complex to encode. It’s using advanced tools like "chroma-from-luma" prediction and more complex block partitioning—it doesn't just stick to eight-by-eight; it can go up to sixty-four-by-sixty-four or down to four-by-four. It takes a lot of CPU cycles to figure out those fifty-percent savings. If you’re a photographer trying to export a thousand wedding photos, saving them as AVIF might take five times longer than saving them as JPEG.

So, it’s a trade-off between the server’s time and the user’s time. If I’m Instagram, I’m happy to spend the extra processing power once to save the bandwidth for a billion users. But if I’m just a guy on a laptop, I might stick to what’s fast.

Mostly. But as of twenty-six, hardware acceleration for AV1 is becoming standard in most phone chips and PC processors. So that "cost" is shrinking every day. However, there is another player in the room that photographers actually care about more than AVIF, and that is JPEG XL.

JPEG XL? That sounds like the "Extra Large" version of the old format. Is it just JPEG with more pixels?

No, the "XL" actually stands for "Long-term." It’s a completely new architecture, and it’s fascinating because it solves a problem that WebP and AVIF actually ignore: legacy migration.

Talk to me about legacy migration. Are we talking about converting my old vacation photos? Because I have a hard drive from two thousand eight that is basically a graveyard of low-res JPEGs.

Yes, but in a way that feels like magic. JPEG XL has a feature called "lossless transcoding." You can take an existing JPEG—one of those old, blocky ones from ten years ago—and convert it to a JPEG XL file. It will be twenty percent smaller, but it is mathematically identical to the original. You aren't "re-compressing" it and losing more quality; you’re just repacking the data more efficiently. And here’s the kicker: you can convert it back to the original JPEG at any time, bit-for-bit.

Wait, that’s huge. So I can save twenty percent of my storage space today, and if for some reason the world decides to go back to nineteen ninety-two, my files are still "original"? It’s like a zip file that stays an image.

Precisely. Well—there I go again with the "P" word. Let's just say, that is exactly why the professional archiving and photography communities are obsessed with it. JPEG XL also handles ultra-high resolutions—we're talking up to a billion pixels—and it supports high bit depths like ten, twelve, or even thirty-two-bit float for HDR. AVIF is great for the web, but JPEG XL is looking like the future of high-end digital imaging.

But wait, if JPEG XL is so good, why did I read that it was struggling? I remember hearing that Google Chrome actually removed support for JPEG XL a few years back. It felt like they were trying to kill it off to favor their own formats like WebP or AVIF.

They did. In twenty-twenty-two, they pulled the experimental support, claiming there wasn't enough interest, which caused a huge outcry in the developer community. It was a very controversial move because JPEG XL is actually faster to decode than AVIF and has better features for high-end photography. But as of the last year or so, we’ve seen a massive resurgence. Professional tools like Adobe Photoshop and the major camera manufacturers are starting to bake it in. It’s a classic battle between "efficiency for the web" versus "quality for the creator."

It feels like the "Image Wars." We’ve got AVIF in the red corner, backed by the web performance crowd, and JPEG XL in the blue corner, backed by the photographers. Who wins? Or do we just end up in a world where we need twelve different codecs to open a meme? Is there a risk of "format exhaustion" where we just give up and go back to screenshots?

The winner, honestly, is the "Image CDN." If you’re a user browsing the web in twenty-twenty-six, you probably don't even know what format you’re looking at. Services like Cloudinary or Akamai use something called "Content Negotiation." When your browser requests an image, the server looks at your "Accept" header. It says, "Oh, Corn is using a modern browser that supports AVIF? Cool, I’ll send him the tiny AVIF version." If you’re on some ancient tablet that only speaks JPEG, it sends you the old-school version.

So the file extension on the end of the URL is basically a lie. I could be looking at a ".jpg" that is actually an AVIF under the hood?

It’s a suggestion. The server is doing the heavy lifting behind the scenes. And this is where it gets really interesting when we look at the "second-order effects" Daniel’s prompt hints at. This isn't just about making pages load faster. It’s about sustainability.

Sustainability? I didn't think my cat photos were killing the planet, Herman. Are we really saying a smaller file size saves the polar bears?

Think about the scale, though. Images make up roughly sixty to sixty-five percent of the total data transferred on the internet. Every time you scroll through a social media feed, you’re downloading hundreds of megabytes of data. If we can cut that data in half globally by moving from JPEG to AVIF or JPEG XL, we are talking about a massive reduction in the electricity required to power data centers and transmission lines. Smaller files mean less cooling, less bandwidth, and less hardware. Efficiency is a "green" technology. It’s estimated that moving the entire web to modern image formats could save as much energy as taking thousands of cars off the road.

That’s a perspective I hadn't considered. Every time I use a more efficient codec, I’m technically saving a tiny bit of the power grid. I feel more virtuous already. But let’s get practical for a second. If I’m a designer or a developer listening to this, what should I actually be doing? Should I go out and convert every PNG on my site to an AVIF right now?

Not manually. That’s a recipe for a headache. The first takeaway is to use the HTML "picture" element. It’s built for this. You can list your AVIF file as the primary source, then a WebP file as the second source, and finally a JPEG as the fallback. The browser will automatically pick the best one it can handle. It’s the "set it and forget it" of web development.

And what about the quality settings? I’ve always been told "save at eighty percent and you’re fine." Does that rule still apply to these new formats? Or do we need a new rule of thumb?

Not really. The "quality" scale in AVIF isn't the same as JPEG. An AVIF at "thirty" might look better than a JPEG at "eighty." My advice is to use tools like Squoosh—which is a web app Google put out—to actually do a side-by-side comparison. You can slide a divider across the image and see exactly where the artifacts start to appear. Most people find they can push AVIF much lower than they think, saving a ton of space without any visible degradation.

I’ve used Squoosh! It’s actually quite fun. It’s like a magnifying glass for math errors. But what about the AI side of this? You mentioned earlier that AI is starting to get involved in how we compress things. Is the future just an AI looking at my photo and saying, "I know what a tree looks like, I’ll just delete the tree and recreate it on the user’s end"?

We are actually seeing that. There is a concept called "Neural Compression." Instead of using hand-written mathematical formulas like DCT, we use a neural network. The encoder "learns" a compact representation of the image, and the decoder—on your phone—uses another neural network to "reconstruct" it. It’s similar to how DLSS works in video games, where the computer renders at a low resolution and AI "fills in" the missing detail.

That sounds dangerous. What if the AI thinks my dog is a cat and "reconstructs" him with pointy ears? Or what if it decides my uncle needs a better hairline in the family reunion photo?

That is the "hallucination" risk. If the compression is too aggressive, the AI might fill in details that weren't there. It might turn a grainy texture into a smooth one or change the shape of a distant face. It’s why neural compression is currently mostly used for things like video calls where a little bit of "faking it" is okay for the sake of fluid motion. For archiving your family photos, we’re still sticking to the "deterministic" math of things like JPEG XL.

Good. I don't want my grandkids looking at photos of me and wondering why I have six fingers because the compression algorithm thought it looked "more natural." So, looking ahead, where does this go? Are we going to see a "JPEG XXL" in five years? Or is the evolution starting to plateau?

I think we’re reaching the limits of "lossy" compression where the gains are becoming marginal to the human eye. The next frontier is "semantic compression."

Semantic compression. Explain that to me like I’m a sloth who’s only had one cup of coffee.

It’s compression based on meaning. Imagine a photo of you sitting in a park. A semantic compressor identifies "Corn," "bench," "grass," and "trees." It realizes that the grass doesn't need to be perfectly sharp for the image to work. It compresses the grass heavily but keeps the textures of your face perfectly clear. It’s a "content-aware" approach that prioritizes what humans actually care about in a scene.

So it’s basically an automated version of what a photographer does when they use a shallow depth of field. Keep the subject sharp, let the rest be a blur of efficient data. Does it know to prioritize things like text or faces?

It uses saliency maps to determine which parts of the image will draw the human eye first. And when you combine that with the HDR capabilities of something like AVIF or JPEG XL, the results are stunning. We’re moving away from "flat" images to ones that can actually represent the true brightness of the sun or the deep shadows of a forest. JPEG was an eight-bit format. It could only represent two hundred fifty-six levels of brightness. Modern formats go way beyond that, supporting ten-bit or twelve-bit color, which means billions of colors instead of millions.

It’s funny how we’ve spent thirty years trying to make digital images look as good as film, and now that we’ve finally surpassed it, we’re spending all our energy trying to make those files small enough to send to someone in half a second.

It’s the cycle of tech. More power leads to more data, which leads to a need for better compression, which frees up more power. But the real takeaway for everyone listening is that the "defaults" are changing. If you’re still using JPEG for everything, you’re basically leaving money—and battery life—on the table.

I wonder if we’ll ever reach a point where the file format is just... "Image." Like, the computer just handles whatever math is necessary and we never have to see a three-letter extension again.

We're getting closer. With containers like HEIF—which is what iPhones use—you can actually store multiple versions of an image, or even an image and a video, in one file. The "extension" is becoming less about the math and more about the container.

Well, I’m going to go audit my website and see how many "legacy" JPEGs I can hunt down. It’s time to move into the twenty-twenties, even if it’s twenty-six already. Herman, this has been an eye-opener. I’ll never look at a blocky sky the same way again. I'll just see it as a cry for help from a nineteen ninety-two algorithm.

Just remember,

those blocks are just the math saying "I give up." With AVIF and JPEG XL, the math never has to give up. It’s got a lot more tools in the toolbox now.

A beautiful sentiment to end on. If you’ve enjoyed this deep dive into the pixels, do us a favor and leave a review on your favorite podcast app. It really helps other nerds find the show.

Big thanks to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.

And a huge shout-out to Modal for providing the GPU credits that power this whole operation. We couldn't do it without them.

This has been My Weird Prompts. You can find us at myweirdprompts dot com for all the links and ways to subscribe.

Catch you in the next one. Stay sharp—or at least, stay efficiently compressed.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1904: JPEG XL vs AVIF: The Future of Your Photos

Downloads

You Might Also Like

#1904: JPEG XL vs AVIF: The Future of Your Photos