#2774: Open Data That Actually Works

The gap between open data promises and reality, and the rare cases where it actually changes policy.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2935
Published: May 12
Duration: 34:17
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: open-source data-integrity public-health

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Open government data portals have been around for about fifteen years, but the gap between what they promise and what they deliver is often a chasm. Many portals, like Israel's data.gov.il, end up as bewildering seas of information — full of datasets like airport vendor listings that feel more like performance art than transparency. The problem isn't just the data quality; it's that platforms like CKAN require institutional investment to be usable, and too often that investment doesn't happen.

The successful examples share a common pattern: an intermediary layer that translates raw data into something useful. The UK's data.gov.uk, launched in 2010, succeeded because of institutional scaffolding — the Open Data Institute trained civil servants, built partnerships with newsrooms, and created feedback mechanisms. This led to the Prescribing Data project, where data journalists found massive prescribing variations that saved the NHS hundreds of millions of pounds annually.

Chicago's open data portal embedded data into governance by creating a Chief Data Officer role with real authority. When the city published restaurant health inspection data, civic coders built apps that let residents search inspection histories. The city then used app usage data to prioritize enforcement. Similarly, New York's 311 data allowed researchers to expose systematic inequality in response times, directly informing City Council policy changes. These examples show that the government's job isn't to make data beautiful — it's to make it reliable, well-documented, and consistently available.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2774: Open Data That Actually Works

Daniel sent us this one — he's been poking around Israel's national open data portal, data dot gov dot il, and came away with what I'd call a very specific kind of disappointment. The kind where you want to believe in the thing, but the thing keeps handing you a CSV of airport vendor listings and calling it transparency.

Which is a rich starting point, because the Ben Gurion Airport vendor dataset is genuinely one of the more memorable things I've heard described. It's almost performance art. Here is the state, solemnly informing you that there is, in fact, a Steimatzky in Terminal Three.

The government wants you to know: we have croissants. This is official.

The question underneath it is serious. Open government data portals — these things have been around in earnest for about fifteen years now, and the gap between what they promise and what they deliver is, in a lot of cases, a chasm. The prompt is asking us to find the examples where that gap actually closes. Where open data isn't just a CSV graveyard, but something that created a real surface for citizens, journalists, researchers to actually do something useful. And ideally, where it looped back into policy.

The airport vendor list is the control group. That's our baseline for "we uploaded a thing, please clap.

And I want to defend that baseline for about five seconds before we leave it behind, because there is a species of open data advocate who will tell you that even the weird obscure datasets matter. That you never know what someone will do with it. And there's a grain of truth there — I've seen researchers do remarkable things with datasets that seemed pointless on first glance. But that defense collapses when the portal is so hard to use that nobody will ever find the dataset in the first place. The prompt's point about these platforms being bewildering seas of information is not wrong.

It's the difference between having a library and having a warehouse. A library has a catalog, a librarian, some sense of what's where. A warehouse has boxes. Most of these portals are warehouses with a search bar that was clearly designed by someone who hates you.

CKAN, which is the platform underneath a huge number of these — it's the open source data portal software that Israel uses, that the US uses for data dot gov, that the UK uses, that something like forty national governments and dozens of cities run on — CKAN is a perfectly capable piece of software. But it's a framework. It's not a user experience. If you install CKAN and do nothing else, what you get is basically a spreadsheet with a URL. The prompt's observation that translation was a nightmare on the Israeli portal — that's not a CKAN problem, that's a deployment problem. Someone had to decide to invest in multilingual metadata, and they didn't.

Or they did it halfway. Which in some ways is worse, because you can see the ghost of the intention.

So let's do what the prompt asks and look at the positive examples, because they do exist. And I want to start with the one that I think most people in the open data world point to as the gold standard, which is the UK's data dot gov dot uk. Now, the UK launched this in twenty ten, which makes it one of the older national portals. But what made it different wasn't the launch — it was the institutional scaffolding around it.

Scaffolding meaning what, specifically?

They created something called the Open Data Institute in twenty twelve, co-founded by Tim Berners-Lee and Nigel Shadbolt. The ODI wasn't just a cheerleader — it was an active intermediary. It ran training programs for civil servants on how to publish data well, not just how to dump a CSV. It built partnerships with newsrooms. It funded startups that built products on top of government data. And crucially, it created a feedback mechanism where data users could flag problems — missing datasets, formatting issues, broken links — and those flags actually went somewhere.

It wasn't "upload and walk away." There was a living relationship between the publisher and the users.

And that produced some useful outcomes. One of the canonical examples is the Prescribing Data project. The NHS publishes data on what medications are prescribed by every GP practice in England — millions of rows. It's granular down to the individual practice and the individual drug. And it was published as open data. What happened was that a group of data journalists and researchers started analyzing it and found massive variations in prescribing patterns — some practices were prescribing brand-name drugs where generic equivalents existed, costing the NHS hundreds of millions of pounds. That analysis got picked up by newspapers, it got debated in Parliament, and it directly led to changes in prescribing guidelines that saved the NHS something like two hundred million pounds a year.

That's the loop. That's exactly the loop the prompt is describing — data goes out, analysis comes back, policy changes. And it's not just "here's a number, be impressed." It's money.

Real money, demonstrable. And the thing is, the NHS didn't publish that data thinking "someone will find prescribing inefficiencies." They published it because it was on a list of datasets that could be opened. The value was discovered by the users. But the discovery was possible because the data was published with enough care that it was actually usable — proper documentation, consistent formatting, regular updates.

This is where I want to push on something. Because the UK example is often held up, and it's real, but it's also a country with a fairly robust institutional culture around data. What about places where the starting point is messier? Because the prompt's experience with the Israeli portal wasn't "this data is bad," it was "I can't find the data I actually want, and what I can find is airport vendors.

Let me give you a municipal example that I think is instructive, because cities sometimes move faster than national governments on this stuff. Chicago's open data portal launched in twenty ten under Mayor Rahm Emanuel, and it now has something like six hundred datasets. But the interesting part isn't the volume. It's that Chicago embedded open data into its actual governance processes in a way that created a feedback loop.

What does "embedded" mean?

The city created a dedicated role — the Chief Data Officer — and gave that office actual authority. Every city department had to publish certain datasets on a schedule, not as a one-off. The CDO's office ran regular "open data meetups" where they'd bring in civic hackers, journalists, community groups, and say "here's what we just published, what can you do with it, what do you need next?" And there's a specific outcome I want to highlight, which is the food inspection data.

Restaurant health inspections?

Chicago published every restaurant health inspection result as structured open data. What happened was that a group of civic coders built an app — actually several apps — that mapped those inspections onto a searchable interface. So you could pull up any restaurant and see its inspection history. But here's where it gets interesting: the city noticed that the app was being used, and they started using the usage data to prioritize inspections. Restaurants with recent violations that were getting a lot of lookups got inspected again sooner. The data loop created an enforcement loop. And the whole thing was driven by the fact that the data was published in a format that made third-party use trivial.

That's a nice inversion of the usual dynamic. Usually it's "government publishes, citizens consume." Here it was "government publishes, citizens build tools, government uses those tools to do its job better.

Chicago isn't alone. New York City has a similar story with its 311 data. The city publishes every 311 service request — noise complaints, pothole reports, broken streetlights — as open data, updated daily. Researchers at NYU analyzed the data and found that response times for complaints in wealthier neighborhoods were systematically faster than in lower-income neighborhoods. That finding made it into a City Council hearing, and it directly informed changes in how the city allocates its repair crews. Again — data goes out, analysis comes back, policy responds.

There's a pattern emerging here, which is that the successful examples all have some kind of intermediary layer. It's not just government-to-citizen. It's government-to-intermediary-to-citizen, and then back again. The intermediary is doing the translation work that the government isn't doing itself.

And I think that's actually the key insight. When you look at open data portals that are useful, you almost always find a thriving ecosystem of intermediaries — data journalists, civic tech groups, academic researchers, open source developers — who are doing the work of turning raw data into something legible. The government's job in this model isn't to make the data beautiful. It's to make the data reliable, well-documented, and consistently available. The intermediaries handle the rest.

Which brings me back to the prompt's point about the Israeli portal. If the translation is bad, if the column names are in Hebrew and there's no English metadata, you've basically walled off the global intermediary ecosystem. You're relying entirely on local intermediaries who can read Hebrew and have the time to write translation scripts. That's a much smaller pool.

It's a solvable problem. The prompt mentioned writing a script to handle the translation — that's exactly the kind of thing a well-resourced open data program would have done on the publishing side. It's not technically hard. It's an institutional prioritization question.

Let me ask you about a case that I think complicates the rosy picture. What about datasets that are sensitive, or where publication creates perverse incentives? I'm thinking of crime data. If you publish granular crime statistics, you get neighborhood-level transparency, which is good, but you also get real estate agents steering people away from certain areas, which can deepen segregation.

That's a real tension, and it's one the open data community has been grappling with for years. The standard response used to be "publish everything, let the users sort it out." That position has softened considerably. The current thinking, which I think is more mature, is that you do a privacy and equity review before publication, and you accept that some datasets need to be aggregated or anonymized in ways that reduce their granularity but protect people.

That creates a new gatekeeping problem. Who decides what's too sensitive?

And that's why the institutional design matters so much. If the Chief Data Officer reports to the mayor's political staff, you get one set of decisions. If they report to an independent commission, you get another. The UK has a reasonably good model here — their Open Data Institute operates at arm's length from government, and they have a Data Ethics Framework that's publicly documented. It's not perfect, but it's transparent about the tradeoffs.

Let me pull us toward a developing-world example, because I think the UK and Chicago stories are useful but they're also wealthy, English-speaking, high-institutional-capacity contexts. What does open data done well look like in a place with fewer resources?

I want to talk about Moldova. And I realize that sounds like the setup to a joke, but it's not.

Moldova's open data program: a punchline waiting to happen, but go on.

Moldova launched its open data portal in twenty eleven, and it was, by any measure, a small, resource-constrained effort. But they did something clever. They focused on a handful of high-value datasets — government spending, procurement contracts, public salaries — and they published them in machine-readable formats with really clear documentation. And they partnered with a local civic tech organization called Monitor, which built a tool called Banipublic dot md — "public money" — that visualized government expenditure in a way that ordinary citizens could actually understand.

What came of it?

The tool was used by journalists to identify procurement irregularities — contracts awarded to companies that didn't seem to exist, that kind of thing. Several of those investigations led to official audits, and at least one led to a criminal prosecution. But the broader effect was that government agencies started behaving differently because they knew their spending data was going to be public and scrutinized. The transparency changed behavior before the audits did.

That's the deterrence effect. Sunlight as disinfectant, but with an actual mechanism behind the metaphor.

And Moldova's not alone. Ukraine has a platform called ProZorro that covers all public procurement — every government purchase above a certain threshold is published in real time, with full documentation. It was built after the twenty fourteen revolution specifically as an anti-corruption measure, and it's been remarkably effective. The World Bank estimated it saved Ukraine something like six billion dollars in its first six years of operation, through reduced corruption and more competitive bidding.

Six billion is not a rounding error. That's real.

The ProZorro model is interesting because it wasn't just "here's a CSV." It was a purpose-built platform with a clear use case: you want to know what the government is buying, from whom, for how much. The interface was designed for that specific question. It wasn't a general-purpose data portal with a search bar and good luck. It was a tool.

This is making me think about the distinction between a data portal and a data product. A portal says "we have many datasets, explore." A product says "we know what question you're asking, here's the answer, and here's the raw data if you want to dig deeper.

That's a really useful framing. And I think the most successful open data efforts are the ones that blur the line between portal and product. The UK's prescribing data is basically a product — it's published in a consistent schema, it's well-documented, there are known use cases, and there's an ecosystem of tools built around it. Chicago's food inspection data is a product. Moldova's public spending data is a product. The airport vendor list is... not a product.

It's a CSV-shaped shrug.

A CSV-shaped shrug. I'm going to use that.

If we're advising a government that wants to do this well — and I realize nobody's asking us, but humor me — what's the playbook? What are the three or four things that distinguish a useful open data effort from a performative one?

I'd say the first one is: start with use cases, not with datasets. Don't do an inventory of everything you have and dump it online. Ask "what questions do citizens, journalists, and researchers actually want to answer?" and publish the data that answers those questions. The UK didn't start by publishing everything — they started with a list of about nine thousand datasets that were identified as high-value through consultation with users.

Curation before publication.

The second is invest in data quality and documentation. This sounds boring, but it's the thing that separates usable data from landfill. Column names in plain language, consistent date formats, clear definitions of what each field means, documentation of known limitations and gaps. The prompt's experience with untranslated Hebrew column names is exactly the kind of thing that kills usability. A well-run portal would have caught that.

Third is build or support the intermediary layer. Fund civic tech fellowships, run hackathons that are actually connected to government needs, hire data journalists in residence. The intermediaries are the force multiplier. A government can publish a thousand datasets, but if nobody's building tools and telling stories with them, the data sits there inert. Chicago's meetups, the UK's ODI partnerships, Moldova's work with Monitor — these are all examples of actively cultivating the intermediary ecosystem.

Fourth, I'd add: close the loop. If someone finds something in your data — an inefficiency, a pattern, a problem — have a process for that finding to actually reach the relevant decision-makers. The prescribing data story works because Parliament paid attention. The Chicago food inspection story works because the health department adjusted its priorities. Without that loop, open data is just a suggestion box that nobody reads.

That loop is the hardest part institutionally, because it requires government agencies to be willing to be corrected by outsiders. That's a cultural shift, not a technical one.

It's also a political shift. If your data reveals that your administration's policies aren't working, the temptation to bury the data or stop publishing it is enormous. The open data commitment has to survive bad news.

Which brings us to an uncomfortable question: how many of these portals actually survive a change of government? The answer is: not all of them. The US data dot gov has been through multiple administrations with very different attitudes toward transparency, and the continuity has been... Some datasets have disappeared. Some have stopped being updated. The portal itself persists, but the commitment behind it fluctuates.

Is there a structural fix for that? Something that makes open data harder to kill when the political winds shift?

The UK's open data efforts were backed by a legal requirement for public bodies to publish certain datasets, and that requirement didn't disappear when the government changed. The US passed the OPEN Government Data Act in twenty nineteen, which codified open data as a default requirement for federal agencies. That's not a guarantee — laws can be ignored or underfunded — but it raises the cost of walking away.

I want to go back to something the prompt raised that I think we've danced around. The distinction between data that lives on the open data portal and data that lives on the statistical agency's own site. The prompt found the consumer price index data on the Central Bureau of Statistics site, not on data dot gov dot il. And that's not a failure of open data — it's a symptom of something structural. Statistical agencies often have their own publication pipelines, their own formats, their own schedules, and they're not going to migrate all of that to a central portal just because someone stood up a CKAN instance.

Honestly, they probably shouldn't. The statistical agency's site is often better for the data it specializes in, because it's purpose-built for that data. The CPI data on the CBS site is going to have better documentation, more detailed breakdowns, historical series, methodology notes — all the things that a general-purpose portal strips out. The portal's value isn't in replacing those specialized sites. It's in being a discovery layer that points you to them.

Except that the discovery layer doesn't work if the statistical agency's data isn't indexed there. Which, in the Israeli case, it sounds like it wasn't.

And that's a metadata problem. The portal should be harvesting metadata from the statistical agency, not asking the agency to re-publish everything. CKAN actually supports this — it has a harvester plugin that can pull metadata from other catalogs. But someone has to configure it, and someone has to maintain the relationship with the statistical agency. Again, it's not a technical problem. It's an institutional coordination problem.

We're circling back to the same theme: the technology is rarely the bottleneck. The bottleneck is whether anyone is paid to care.

Whether the people who are paid to care have the authority to make other agencies cooperate.

Let me throw one more example at you that I think is interesting, because it's a case where open data created value in a way nobody predicted. Taiwan's government publishes real-time air quality data from monitoring stations across the island. It was originally published for environmental researchers. But what happened was that a group of citizen developers built an app called AirBox that combined the government data with data from low-cost sensors that they distributed to schools and community centers. The result was a hyperlocal air quality map that was more granular than anything the government could produce on its own. And it became so popular that the government started incorporating the citizen sensor data into its official monitoring system.

That's the loop again, but with a twist — the citizens aren't just analyzing the data, they're contributing to it. It's open data becoming collaborative data.

The key detail is that the government was willing to accept that the citizen data might be messier than its own data, and that the tradeoff was worth it for the coverage. That's a level of institutional humility that I don't think we see very often.

There's a similar story in Japan after the Fukushima disaster. The government's radiation monitoring data was sparse and slow to update, so a group called Safecast built a network of citizen radiation sensors and published everything as open data. Their data ended up being more comprehensive and more trusted than the official data, precisely because it was independent. And over time, the government started using Safecast data to fill gaps in its own monitoring.

Which is both inspiring and a little bit damning, right? The citizens had to build the thing the government should have built, because the government wasn't moving fast enough.

Yes, but the government was eventually willing to incorporate it. That's the part that matters for our discussion. An open data culture that's healthy has to be able to say "the outsiders did it better, and we'll use their work." That's a hard thing for a bureaucracy to say.

Alright, let me try to synthesize what we've been circling around. The prompt asked for examples of open government data done well, where transparency goes beyond uploading a random CSV. We've got the UK's prescribing data saving hundreds of millions of pounds. We've got Chicago's food inspections creating an enforcement feedback loop. We've got Moldova and Ukraine using procurement transparency to fight corruption, with real money saved and real prosecutions. We've got Taiwan and Japan showing collaborative models where citizen data supplements official data. What ties all of these together?

I'd say three things. First, they all started with a clear use case, not a data dump. Second, they all invested in making the data actually usable — proper documentation, consistent formats, regular updates. And third, they all built or supported an intermediary layer of journalists, developers, and researchers who turned raw data into something actionable.

The fourth thing, which I think is the hardest: they all had a government that was willing to be changed by what the data revealed. That's the part you can't fake. You can stand up a CKAN instance in an afternoon. You can upload a few hundred CSVs. You can issue a press release about digital transformation. But if you're not prepared for someone to find something you'd rather they didn't find, and to act on it, you're not doing open data. You're doing open theater.

That's exactly the right term. And I think that's what the prompt was picking up on — the feeling that a lot of these portals are stage sets. The props are there, the lighting is good, but there's no actual play happening.

The airport vendor list as set dressing.

The airport vendor list is the potted plant in the corner of the stage. It's technically part of the production, but nobody came to see it.

I do want to acknowledge one counterpoint before we wrap, which is that sometimes performative transparency is a step on the way to real transparency. A government that publishes airport vendor lists today might publish procurement data tomorrow, if there's enough pressure. The portal existing at all creates a hook that advocates can use. "You already have the platform — now put something useful on it.

That's a fair point. And there are cases where that's exactly what happened. The UK's portal started with a lot of low-value datasets and gradually improved as user feedback accumulated. The US portal went through a similar evolution. The danger is when the portal is treated as the endpoint rather than the starting point. When the press release about launching the portal is the last thing that happens, not the first.

That's where the institutional commitment matters. If there's no budget for maintenance, no staff for curation, no process for responding to user feedback, the portal is just going to sit there and rot. Datasets will stop updating. Links will break. The whole thing will become a monument to a moment of enthusiasm that wasn't sustained.

The half-life of an unmaintained open data portal is surprisingly short. I've seen ones where thirty percent of the links were dead within two years. That's worse than not having a portal at all, because it trains users not to trust the platform.

If we're talking to a government that's considering launching an open data portal, or revitalizing one that's gone stale, what's the one thing we'd tell them?

Hire someone whose full-time job is making the portal useful, and give them the authority to make other agencies cooperate. Everything else flows from that. The technology is secondary. The policies are secondary. The press releases are secondary. You need a person who wakes up every day thinking about whether the data is findable, usable, and actually being used. If you're not willing to pay for that person, don't bother launching the portal.

The Chief Data Officer as the make-or-break variable. I buy that. And I'd add: give that person a public feedback channel, and require them to publish what they heard and what they did about it. The transparency has to apply to the transparency effort itself.

I love it.

We're that kind of podcast.

And now: Hilbert's daily fun fact.

Hilbert: In nineteen forty three, a German cargo ship carrying a hold full of dried bladderwrack seaweed ran aground on the Skeleton Coast of Namibia. The crew survived, but the seaweed — harvested from the North Atlantic for alginate production — scattered across the beach and briefly created a small, accidental kelp-processing industry among local fishermen who had never encountered that species before. The entire operation collapsed within six months when termites ate the drying racks.

Termites ate the drying racks. Of course they did.

A German seaweed shipwreck creating a six-month micro-industry in the Namib Desert. I have no follow-up.

So here's the forward-looking thought I want to leave with. The open data movement is about fifteen years old now. The platforms exist. The CKAN instances are spun up. The question for the next decade isn't "will governments publish data" — enough of them will, at least performatively. The question is whether the feedback loops will close. Whether the data will actually change how governments behave. And that depends less on technology and more on whether citizens, journalists, and researchers keep showing up to do the hard work of turning raw data into accountability. The portals are just infrastructure. The accountability is the product.

The examples we found — the UK, Chicago, Moldova, Taiwan — suggest that it can work. It's not a fantasy. The loop can close. But it takes sustained effort on both sides, and it takes a government that's willing to hear things it doesn't want to hear. That's rare, but it's not impossible.

Thanks to our producer Hilbert Flumingtop for keeping this operation running. This has been My Weird Prompts. You can find every episode at myweirdprompts dot com, and if you've got a prompt you'd like us to dig into, that's where you send it. We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2774: Open Data That Actually Works

Downloads

You Might Also Like

#2774: Open Data That Actually Works