Herman, I was looking at the source code for a few of the sites I follow the other day, and it struck me how much of a paradox we are living in right now. It is March of two thousand twenty six, and we have these incredibly fast, lean, static architectures built for performance and security. We use frameworks that pre-render everything and global content delivery networks to put files milliseconds away from users, yet the moment you look under the hood, they are often weighed down by massive, bloated tracking scripts. It is like buying a high performance racing car and then insisting on dragging a heavy trailer full of surveillance equipment behind it just so you can see where you have been. It feels fundamentally at odds with the whole philosophy of the modern web.
Herman Poppleberry here, and you are right, Corn. It is what I have been calling the analytics paradox. We spend all this time optimizing our build processes, minimizing our bundle sizes, and fine tuning our edge caching, only to invite third party scripts to execute arbitrary code on our users' machines. We are essentially handing over the keys to our performance budget to companies whose primary goal is data harvesting, not user experience. Our housemate Daniel actually sent us a prompt about this very thing this morning. He was asking about the best ways to get website insights without falling into that trap of invasive tracking. It is a question that hits right at the heart of how we build for the web in two thousand twenty six, especially as the old methods are breaking before our eyes.
I am glad Daniel brought this up because it feels like we are at a definitive turning point. For a long time, the default answer for any webmaster was just to slap a snippet of Google Analytics on everything and call it a day. It was the industry standard. But the landscape has shifted beneath our feet. We are not just talking about privacy regulations like the General Data Protection Regulation or the California Consumer Privacy Act anymore, although those are huge. We are seeing a fundamental shift in how users interact with the web. People are more protective of their data than ever before, and browsers have become much more aggressive at blocking the very scripts that traditional analytics rely on. The "surveillance web" is hitting a wall of technical and social resistance.
For a static site specifically, the stakes are different. When you are running a serverless or static setup, you do not necessarily have a traditional backend server logging every request in a way that is easy to parse. You do not have a Ruby on Rails or a Django instance sitting there recording every hit to a database. So, developers often feel forced toward client side solutions because they seem like the only way to see what is happening. But today, we want to break down why that might be a mistake and look at the alternatives. We are going to look at client side tracking, server side log analysis, and what I think is the most compelling development lately, which is proxy based event streaming. We want to move from user surveillance to what I call traffic intelligence.
Let's start by defining the categories because the terminology can get a bit muddy. When we talk about the analytics stack for a static site, we are really looking at where the data collection happens. Is it happening in the user's browser, which is the client side? Is it happening at the edge of the network where the file is served, which we call edge side? Or is it happening through some sort of intermediary proxy that you control? Each of these has a different impact on privacy, performance, and data accuracy.
Let's start with the one everyone knows, which is client side tracking. This is your Google Analytics four, your Matomo, or your Plausible. The way it works is fairly straightforward. You include a small piece of JavaScript in your site's header. When a user loads the page, that script executes. It looks at the Document Object Model, or D-O-M, gathers information about the screen resolution, the referral source, the browser version, and the page title, and then it sends a beacon or a post request back to a central server. Because it is running in the browser, it can see things that a server cannot, like how far someone scrolled or if they hovered over a specific image.
Right, and the advantage there has always been the depth of data. Because the script is a guest in the user's browser, it can watch their every move. It can track clicks on specific buttons that do not trigger a new page load, it can see how long the tab was actually active versus just sitting in the background, and it can try to stitch together a "user journey" across multiple sessions. But here is the catch. Herman, from what we are seeing in recent reports, what percentage of that traffic is actually being captured by these scripts?
It is becoming a real reliability crisis for anyone who cares about accurate data, Corn. Current data as of March two thousand twenty six suggests that over forty percent of global internet traffic is now filtered by some form of ad blocker or privacy focused browser extension. If you are using Brave, or Safari with its latest Intelligent Tracking Prevention, or even Chrome with the new Privacy Sandbox features, those traditional tracking scripts are often blocked by default. So, as a webmaster, you might look at your dashboard and see a thousand visitors, but in reality, you might have had sixteen hundred or even two thousand. You are making business or design decisions based on a subset of users who are specifically not blocking trackers, which creates a massive selection bias. You are essentially invisible to the most tech savvy part of your audience.
Not to mention the performance hit, which is something we talk about constantly on My Weird Prompts. Even a small script has to be fetched, parsed, and executed. On a static site that is supposed to be lightning fast, that extra round trip to a third party server can actually hurt your Core Web Vitals. We are talking about metrics like Largest Contentful Paint and Interaction to Next Paint. If your tracking script blocks the main thread while it is trying to figure out the user's screen resolution, the user experiences a laggy, unresponsive site. It seems counterproductive to optimize your images and your choice of framework just to let a heavy tracking script drag down your performance scores. You are paying a high price in user experience for data that is increasingly inaccurate.
That leads us to the second category, which is capturing data at the network level, or what we often call edge side analytics. This is the "cleaner" way. If you are using a provider like Cloudflare or Vercel or Netlify to serve your static site, they are already seeing every single request that comes in. They do not need a script to run in the browser because they are the ones delivering the bits to the user. When a browser asks for index dot H-T-M-L, the edge server logs that request. It knows the time, the file requested, the approximate location based on the Internet Protocol address, or I-P, and the referral header.
This feels much more honest. It is just looking at the traffic that is already happening. It is like a shopkeeper counting how many people walk through the door rather than hiring a private investigator to follow every customer around the store. But I imagine there are some technical trade offs there when it comes to the granularity of the data. If there is no script running on the page, how does an edge provider know if someone scrolled to the bottom of a long article or clicked a specific link that does not trigger a new page load?
We have to be realistic about that. Edge side analytics are incredibly accurate for traffic volume. They capture one hundred percent of requests because they happen before the browser even has a chance to block anything. If a bot scrapes your site or a user with intense privacy settings visits, the edge server still sees the request for the file. However, you lose that deep behavioral data. You get the "what," as in what files were requested, but you lose some of the "how." You can see that a user visited five pages, but you might not know if they spent three minutes reading your long form essay or if they just bounced immediately. You also have to deal with bot traffic differently. Since edge logs see everything, they see every automated crawler and script kiddie on the internet. A good edge analytics tool has to have a very sophisticated way of filtering out that noise to give you a "human" view of your traffic.
For many people, especially those running blogs, documentation sites, or small business pages, volume and referral sources are probably eighty percent of what they actually need. They want to know which articles are popular and where the readers are coming from. You do not necessarily need to know their mouse movements to understand if your content is resonating. If they are requesting the next page in a series, you know they are reading.
There is a huge privacy win here. Most edge side solutions, like Cloudflare Web Analytics, can be configured to be completely anonymous. They process the logs, aggregate the data into buckets, and then discard the personally identifiable information like the full I-P address. You end up with a high level view of your traffic that is technically impossible to trace back to an individual user. It is the ultimate privacy first approach because you are never even asking the user's browser to execute tracking code. You are just analyzing your own server's activity.
I want to move into something a bit more complex that Daniel mentioned in his prompt, which is the idea of tracking media files. This is a specific pain point for us. We host this podcast, My Weird Prompts, and our audio files are stored in a Cloudflare R-two bucket. Now, an audio file is just a static asset. You cannot put a JavaScript tracking snippet inside an M-P-three file. So if someone downloads the episode directly or listens through a podcast app like Overcast or Apple Podcasts, how on earth do we get any data on that without being invasive?
This is where it gets really interesting, and it involves using a proxy. Instead of giving the user a direct link to the file in the R-two bucket, you give them a link to a small piece of code, like a Cloudflare Worker, that sits in front of that bucket. This is what we call proxy based event streaming. When a user clicks play, the request goes to the Worker first. The Worker then does two things simultaneously. It fetches the audio file from the bucket and streams it to the user, and at the same time, it logs the event to an analytics database or a logging service.
So the Worker acts as a middleman. It is like a gatekeeper who hands you the file but also makes a quick note in a ledger that a file was handed out.
And because you control the Worker, you can make it as private as you want. The Worker sees the user's I-P address, but instead of storing it, the Worker can hash that address with a "daily salt"—a random string of characters that changes every twenty four hours. This allows you to tell if it is the same user coming back to finish an episode later that day, but it makes it impossible for you or anyone else to know who that user actually is. You are recording an event, not a person.
This is how you get accurate podcast metrics in two thousand twenty six. Since the Worker is the one serving the file, it can track things like byte range requests. If a podcast app only downloads the first ten percent of a file to preview it, the Worker can see that and distinguish it from a full download. This is a much more robust way of measuring engagement than just counting how many times a link was clicked. And because the proxy is under your control, you are the one who decides what data is kept and what is thrown away. You are not sending that data off to some giant advertising network that is going to use it to build a profile of your listeners.
It is a powerful middle ground. It gives you more control than raw logs but maintains the privacy and performance of a scriptless architecture. But I wonder about the cost and complexity. Setting up a Cloudflare Worker and a database to store those events sounds a lot harder than just pasting a Google Analytics code. Is this something a regular webmaster can actually pull off?
It is definitely a higher barrier to entry, but the tools are getting much better. There are now open source templates for these kinds of workers that connect directly to privacy focused databases like Tinybird or ClickHouse. And honestly, if you care about data sovereignty, it is worth the effort. We talked about this in episode seven hundred ninety six, the idea that keeping your data on your own infrastructure is becoming a legal and ethical imperative. When you use a third party tracker, you are essentially giving them a window into your users' lives. When you use a proxy you control, you are closing that window and keeping the data within your own digital borders.
This ties into something else we covered in episode seven hundred fifty three, which was about agentic behavior. As we move deeper into this era of A-I agents and automated scrapers, traditional analytics are becoming even more skewed. An A-I agent that is scraping your site to summarize it for a user will not execute your JavaScript trackers. It just grabs the H-T-M-L and leaves. If you only rely on client side JavaScript, you are completely blind to the A-I agents interacting with your site. In two thousand twenty six, that could be a significant portion of your traffic.
That is a fascinating shift. If a large language model's crawler visits my site to update its knowledge base, I want to know that. Not because I want to track the A-I, but because I want to know how my content is being consumed and repurposed. If I only use Google Analytics, I might think my traffic is dying, while in reality, I am being cited by thousands of A-I agents every day. To see that, you have to look at the server side or edge side logs. The agents leave footprints in the logs that the JavaScript trackers never see.
If we are looking for a recommendation for Daniel and our listeners, it sounds like the best approach is a layered one. Maybe you do not need one single tool to do everything. Perhaps you use edge side analytics for your overall traffic volume because it is fast, accurate, and privacy friendly. And then, for specific high value actions like media plays or form submissions, you use a targeted proxy or a server side event.
I think that is the gold standard right now. Use something like Cloudflare Web Analytics or a similar edge based tool for your high level metrics. It is free, it requires no cookies, and it does not slow down the site. Then, if you have a podcast or a video series, set up a simple proxy worker to track those specific downloads. It keeps your site lean, respects your users' privacy, and gives you much more reliable data than you would get from a script that half your audience is blocking anyway.
It moves us away from this idea of user surveillance and toward what I like to call traffic intelligence. We do not need to know who the user is, what other tabs they have open, or what their digital fingerprint looks like. We just need to know if our content is being consumed and how people are finding it. There is this widespread assumption that you need cookies and persistent identifiers to understand if your content is popular. But that is just not true. Request level metadata, if handled correctly, is more than sufficient for almost every static site use case.
You can see referral headers to know if people are coming from social media or search engines. You can see which pages are the most requested. You can even see geographic distribution at a country or city level without ever needing to store a single personal detail. It is about being a good steward of the data you are entrusted with.
Let's talk about the practical side for a second. If someone is listening to this and they realize they have been relying on a bloated tracking script, how do they start the transition? What is the first step to auditing their current footprint?
The easiest thing to do is open your browser's developer tools. Go to the network tab and refresh your site. Look at how many requests are going to domains that you do not own. If you see a dozen different requests to Google, Facebook, or various ad networks, those are all potential privacy leaks and performance bottlenecks. You can actually see the "waterfall" of how these scripts delay your site's loading time. Once you see the scale of it, the motivation to switch becomes a lot stronger.
And once they see that, they can start looking at edge side alternatives. If they are already on a platform like Cloudflare, it is often just a matter of toggling a switch in the dashboard to enable web analytics. You can then remove the JavaScript snippets from your code entirely. It is one of those rare moments in web development where you actually get to delete code and end up with a better result. You are reducing your attack surface, improving your performance, and respecting your users all at once.
It is incredibly satisfying. And for the media tracking, like we mentioned with our podcast, it is about moving the logic from the browser to the infrastructure. Instead of asking the browser to tell you when a file is played, you ask the infrastructure to tell you when a file is served. It is a subtle shift, but it changes everything about the privacy profile of your site. It makes your data more resilient to browser changes because you are working with the fundamental mechanics of the web, not trying to bypass them.
I think one of the things most people do not realize is that by moving to these privacy first, edge based solutions, you are actually future proofing your site. We are seeing a trend where browsers are becoming more and more restrictive. If you build your analytics strategy around invasive tracking, your data quality is only going to go down over time as browsers get smarter at blocking you. But if you build around server side logs and edge processing, your data quality stays consistent. You are no longer in a cat and mouse game with browser developers.
We are already seeing moves toward what people are calling the private advertising A-P-I and other ways to aggregate data without individual tracking. But for the independent creator or the small business, you do not need to wait for those complex systems. You can just stop tracking individuals and start tracking events. It is a much cleaner way to live. It also simplifies the legal side of things. If you are not collecting personally identifiable information, your compliance burden for things like the General Data Protection Regulation becomes significantly lighter. You do not need those annoying cookie banners that everyone hates because you are not using cookies for tracking. You are just running a clean, efficient web server.
It is funny, we have spent decades making the web more complicated and more invasive, and now the best practice is basically to go back to the way things were in the early days, just with much better tools. We are returning to a model where the server log is the source of truth, but now we have the computational power at the edge to process those logs in real time and give us beautiful, actionable dashboards without the privacy trade offs. We have come full circle, but with thirty years of engineering experience to make it work better.
It feels like a more honest relationship with the audience too. We are telling them, we value your time and your privacy enough to not follow you around the internet. We just want to know if you enjoyed the article or the podcast episode. It builds a level of trust that you just cannot get when you are using the same tracking pixels as the giant data brokers. It is about building a community, not a target list.
I agree. And I think Daniel's question really highlights that there is a growing community of developers who want to do things the right way. They are tired of the old way of doing things. They want performance and privacy, and they are realizing they do not have to choose one or the other. They can have a site that is both lightning fast and deeply respectful of the person on the other side of the screen.
Before we wrap up, I was just thinking about that point you made earlier about the selection bias in traditional analytics. It is actually even worse than just missing users. It is that you are specifically missing the most tech savvy and privacy conscious part of your audience. Those are often the people whose feedback and behavior you want to understand the most, and they are the ones who are completely invisible to Google Analytics.
You are essentially filtering out your most sophisticated users. If you are building a tool for developers or a high end technical blog, your analytics are going to be wildly inaccurate because your target demographic is the most likely to be using advanced blocking tools. You end up optimizing your site for the people who are the least like your actual core audience. It is a total feedback loop of bad data. You are building for the average, while your outliers are the ones driving innovation.
It is like trying to study the habits of rare birds but only looking at the ones that are comfortable enough to walk into a bright orange cage in the middle of a field. You are not getting a representative sample; you are getting a sample of the least cautious individuals. In the world of web development, that means you are missing the early adopters and the power users.
I love that analogy. It really highlights why the server side approach is so much more scientifically sound. You are observing the natural environment without disturbing it. You are getting the true picture of who is visiting and what they are doing, regardless of what tools they have installed in their browser. It is data you can actually trust to guide your roadmap.
It is a more robust way to do science, and it is a more robust way to do web development. I am glad we got to dive into that nuance. It is not just about the numbers being higher; it is about the numbers being more representative of reality. It is about truth in data. To summarize the recommendation for Daniel, step one is to look at edge side analytics. Step two is to remove third party client side scripts whenever possible. And step three, if you have specific assets like audio or video, look into using a proxy worker to log those events at the server level. It is a three step plan for a cleaner, faster web.
That is a solid roadmap. It covers the basics, handles the complex media cases, and keeps everything lean and private. And honestly, it is just more interesting from a technical perspective. Building a custom worker to handle your podcast downloads is a lot more rewarding than just copy pasting a script from a giant tech company. You actually understand how your data is being generated and stored.
Definitely. I think we have covered a lot of ground here. It is a complex topic, but the path forward is actually quite clear. We are moving toward a web where the infrastructure itself provides the intelligence we need, leaving the browser free to just do its job, which is rendering content for the user. We are losing the surveillance trailer and finally letting that racing car drive at the speed it was designed for. If you have been enjoying these deep dives into the technical and ethical side of the web, we would really appreciate it if you could leave us a review on your podcast app. It helps other people find the show and join the conversation about how we can build a better internet together.
It really does help. We see every review and it means a lot to us. And if you want to get in touch or see the archives, you can always head over to myweirdprompts dot com. We have over a thousand episodes there now, covering everything from static site scaling to the dark archives of the internet.
Check out episode seven hundred seventy two if you want to hear more about scaling those static sites, or episode seven hundred ninety six on data sovereignty. There is a lot of connected tissue between these topics. We are trying to map out the future of the web, one prompt at a time.
Thanks to Daniel for sending in such a timely prompt. It gave us a chance to really dig into the mechanics of how the web is changing for the better in two thousand twenty six. This has been Herman Poppleberry.
And Corn Poppleberry. We will see you in the next one.
Take care, everyone.
Goodbye.
Goodbye.