#2848: Can 100 Volunteers Let AI Govern Them for a Month?

An AI council of multiple models, a hundred volunteers, and a month of real municipal decisions. Here’s how you’d run the experiment.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-3017
Published: May 15
Duration: 35:31
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents ai-alignment ai-agentocracy

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The idea sounds absurd at first: a hundred volunteers, a council of AI agents, and a month of real governance decisions. But the absurdity is the point. Daniel’s proposed “AI agentocracy” experiment is less about replacing human government than about testing whether multi-model AI deliberation can help humans navigate the hard problem of balancing multiple stakeholders.

The pieces already exist. Denmark’s Synthetic Party ran an AI candidate, Leader Lars, in the 2022 parliamentary elections. DeepMind published a 2022 paper showing that an LLM could mediate economic trade-offs in a way participants found fairer than human-designed systems. Ireland’s citizens’ assemblies proved that randomly selected citizens, given structured information and expert testimony, could produce recommendations that reshaped national policy — including the 2018 abortion referendum. And Andrej Karpathy’s LLM council architecture — where multiple models deliberate anonymously through blind critique and refinement — has already been used for personal decisions like career planning and house hunting. The experiment Daniel describes is the crossover episode nobody has filmed yet.

The design is concrete. Recruit a hundred demographically diverse volunteers. Give them real municipal-level decisions — budget allocation, common space rules, infrastructure priorities — with a one-month weekly cycle. Each week, participants receive a neutral briefing and submit their own preferences (the human baseline). The AI council receives the same briefing plus anonymized preferences, then runs multiple rounds of blind deliberation. It produces a decision with written reasoning. Participants rate the outcome on fairness, practicality, and willingness to accept it. At month’s end, compare the AI council’s decisions against what the group would have decided on its own. The critical metric isn’t just satisfaction — it’s legitimacy: did participants feel the process was fair, even when the outcome wasn’t what they wanted? Guardrails include bounded decision domains and a transparent human veto, used rarely enough that its use becomes a data point itself.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2848: Can 100 Volunteers Let AI Govern Them for a Month?

Daniel sent us one of those prompts where you read it and think, okay, this is absurd, and then you read it again and realize the absurdity is the point, and actually it's a genuinely interesting question. He's proposing what he calls an AI agentocracy — taking a group of about a hundred volunteers, handing governance decisions over to a council of AI agents for a month, and seeing what happens. The question isn't whether this is a good idea for actual government. It's whether the experiment itself could teach us something about how agentic AI might help human decision-makers navigate the hard problem of balancing multiple stakeholders.

The thing is, the pieces are all there. They're just sitting on the shelf waiting for someone to bolt them together. You've got the Karpathy LLM council architecture, which Daniel's actually used for personal decisions — career planning, house hunting. You've got the sortition history we've talked about. You've got Ireland's citizens' assemblies as a real-world model of what happens when you take random people and give them structured deliberation. The experiment isn't science fiction. It's logistics.

The prompt is basically: has anyone tried this, and if not, how would you design the experiment? So let's start with the first part. Has anyone actually done this?

Not in the form Daniel's describing, no. I went looking. There's no published experiment where a group of humans voluntarily ceded real decision-making authority to an AI council and then measured outcomes. But the adjacent attempts are fascinating. There was a project called the Synthetic Party in Denmark — an AI-powered political party that ran a candidate in the twenty twenty-two parliamentary elections. The AI, called Leader Lars, was trained on the policies of Danish fringe parties since nineteen seventy. It didn't win. It got, I think, a few hundred votes out of the tens of thousands needed. But it was a genuine attempt to put an AI on a ballot. Not as a tool for a human candidate — as the candidate.

That's the name they went with.

That's the name. And then there's the more academic side. A group at DeepMind, actually back in twenty twenty-two, published a paper on what they called democratic AI — using a language model to find policy trade-offs that human participants in a sort of economic game found fairer than what human-designed systems produced. The AI didn't govern anything real, but it demonstrated that an LLM could mediate between conflicting preferences in a way that felt more legitimate to the participants than the status quo.

The lineage is there. Synthetic Party in Denmark, DeepMind's democratic AI paper, Ireland's citizens' assemblies as the sortition proof-of-concept, and the Karpathy council as the technical architecture. The experiment Daniel's describing is basically the crossover episode nobody's filmed yet.

And the Ireland piece is actually crucial here, because it answers the most obvious objection. The objection is: random people don't know enough to make good decisions. Give them complex policy questions and they'll be lost. Ireland tested that. The citizens' assembly that tackled the abortion referendum — which was one of the most divisive, legally intricate issues in Irish history — consisted of sixty-six randomly selected citizens and thirty-three politicians. They heard expert testimony. And they produced recommendations that ended up forming the basis of the constitutional amendment that passed in twenty eighteen. Random people, given structure and information, made a decision the country accepted.

That's the key insight that makes the AI agentocracy experiment not just a joke. The question isn't whether AI should replace human judgment. It's whether the deliberation process — the thing the citizens' assembly does with expert testimony and structured discussion — can be accelerated or augmented by having AI agents model the stakeholder landscape.

Let me actually walk through the Karpathy council architecture, because it's the engine that would make this work. The idea is deceptively simple. You take the same prompt and send it to multiple AI models. Each model produces its own response independently. Then you take all the responses, anonymize them, and send them back to the models with a request to critique and refine. The models don't know whose response is whose. They just see arguments and counter-arguments. Then you iterate. After a few rounds, patterns emerge. Agreement coalesces around certain positions. Disagreements get sharpened and clarified rather than muddied.

Daniel's point about using this for house hunting is actually instructive. He and Hannah had different preferences. He fed the council a transcript of their discussion, their specs, their constraints. The council produced a synthesis that did justice to both sets of preferences. That's not just aggregation. That's something closer to mediation.

It's computationally assisted consensus. And that's the part that maps onto governance. Governance is not about finding the mathematically optimal policy. It's about finding the policy that enough stakeholders can live with. The DeepMind paper showed that AI was surprisingly good at this — better than human-designed mechanisms at producing outcomes that felt fair to participants. The mechanism wasn't majority rule. It was something more nuanced, something that weighted minority concerns in a way that pure voting doesn't.

Which brings us to the experiment design. Daniel's asking for something concrete. A plan someone could actually execute. So let's build it.

Step one: recruitment. A hundred volunteers. That number is actually well-chosen. It's large enough to be a meaningful sample — you can do statistical analysis on a hundred people — but small enough to manage. You'd want demographic diversity. Age, gender, occupation, political leaning. You're not trying to be perfectly representative. You're trying to avoid the obvious failure mode where everyone is a twenty-five-year-old software engineer who already thinks AI is the answer to everything.

You'd also need a consent structure that's informed. These people are handing real decisions to an AI council for a month. The decisions can't be life-altering — Daniel's explicit about guardrails — but they have to be real enough that participants feel the weight of them. If it's all hypothetical, you learn nothing.

Which brings us to step two: the decision domain. This is the hardest design problem. What decisions do you actually give the council? Daniel mentioned municipal decisions as a possibility. I think that's the right level. Not national policy — too high stakes, too many confounding variables. Not what's for dinner — too trivial. Municipal decisions hit the sweet spot. Things like: how should the community allocate a small shared budget? What should the rules be for use of a common space? What priority order for minor infrastructure improvements?

The key is that the decisions have to be things the participants would otherwise have an opinion about. You need a baseline. You need to know what the humans would have decided without the AI council, so you can compare.

So here's the structure I'd propose. A one-month experiment with a weekly cycle. At the start of each week, participants receive a briefing document — a neutral synopsis of the decision context. Here's the situation. Here are the constraints. Here are the stakeholder groups affected. Here are the trade-offs. Participants then submit their own initial preferences. That's your human baseline.

Then the AI council gets the same briefing?

The AI council gets the same briefing, plus the anonymized human preferences. Not to be bound by them, but to understand the preference landscape. Then the council runs — multiple models, multiple rounds of blind critique and refinement, just like Karpathy's architecture. At the end of the deliberation, the council produces a decision, with a written explanation of its reasoning. Participants then see the decision and rate it on multiple dimensions: fairness, practicality, whether it reflects their preferences, whether they'd accept it as binding.

Then the crucial step: at the end of the month, you compare the AI council's decisions against what the humans would have decided on their own. You can do that because you captured the initial preferences. You can measure satisfaction with the AI decisions versus satisfaction with whatever decision-making process the group normally uses.

You can also measure something subtler. The DeepMind paper looked at legitimacy — did participants feel the process was fair, even when the outcome wasn't what they wanted? That's arguably more important than satisfaction with the outcome. Political systems survive losing. They don't survive the perception that the game is rigged.

The metrics would be: satisfaction with outcomes, perceived fairness of process, willingness to accept decisions you disagree with, and some measure of how well the decisions actually resolved the trade-offs. That last one is tricky because there's no objective answer to a value trade-off. But you can look at things like: did the decision leave any stakeholder group completely alienated? Did it produce a coherent rationale that acknowledged the trade-offs rather than pretending they didn't exist?

The guardrails question is where this gets real. Daniel's right that you can't let the AI council make decisions that could cause significant harm. But you also can't constrain it so tightly that the experiment becomes pointless. I think the answer is scope limitation plus a human veto. The council's decisions apply only within a bounded domain that participants have pre-agreed to. And there's a human override — if a decision is harmful or absurd, the experimenters can block it. But the override has to be transparent and rare. If you use it more than once or twice, the experiment failed.

The override itself becomes a data point. If the AI council keeps producing decisions that require a human veto, that's telling you something important about the gap between what looks good in deliberation and what works in practice.

There's a subtler guardrail question too. The briefing documents. Who writes them? What counts as a neutral synopsis? That's not a trivial problem. The framing of the decision context will shape the outcomes. Ireland's citizens' assemblies spent a lot of time on this — they had expert advisory groups, they had legal frameworks for what constituted balanced information. An AI agentocracy experiment would need something similar. Maybe you have the briefing documents written by a separate team, reviewed for balance, and then summarized by the AI council as its first task — produce a shared understanding of the facts before deliberating on the decision.

That's clever. Make fact-finding the first deliberation. The council has to agree on what the situation is before it can argue about what to do. That mirrors what good human committees do, and it gives you a check on whether the AI is introducing bias at the framing stage.

Now, the technical implementation. Daniel says the output constraints are the easy part, and he's right. Getting an LLM to produce a structured decision with a vote tally and a reasoning section is straightforward. The harder technical problem is the shifting context — the weekly synopsis of what happened in the community. That requires someone to actually observe and summarize. You can't automate that part without introducing another layer of AI that itself needs validation.

You need a human clerk. Someone whose job is to produce the weekly briefing, attend the discussions, capture the relevant facts. That's not glamorous but it's essential. The quality of the briefing determines the quality of the deliberation.

You'd also need to decide which models to use. The Karpathy council works by having multiple models critique each other. If you use only one model, you're not getting the benefit of diverse perspectives. If you use too many, the deliberation becomes chaotic. I'd suggest three to five different models — enough diversity to catch each other's blind spots, not so many that the process becomes unmanageable.

You'd want models with different training philosophies. A mix of the major frontier models, maybe one that's been fine-tuned for reasoning, one that's more general. The point is to avoid monoculture. The whole value of the council architecture is that different models notice different things.

Let me address the elephant in the room. Has anyone tried this seriously? I said no, but there's a partial exception worth mentioning. There's a project called the AI Government Simulation that ran in twenty twenty-three out of the University of Tokyo. It wasn't a real community — it was a simulated one. But they had AI agents playing the roles of legislators, bureaucrats, and citizens, and they ran policy scenarios through the system. The findings were mixed. The AI legislators were good at identifying trade-offs and generating options. They were bad at anticipating second-order consequences that weren't explicitly in their training data. The researchers concluded that AI could be a useful deliberative tool but shouldn't be the decision-maker.

Which is basically Daniel's thesis. The experiment isn't about replacing human governance. It's about understanding where AI deliberation adds value and where it breaks down.

And I think that's what you'd actually learn from this experiment. Not whether AI is better than humans at governing — that's the wrong question. The right question is: at what points in the decision-making process does AI-assisted deliberation produce insights that humans alone might miss? And at what points does it produce confident-sounding nonsense that a human has to catch?

Let's talk about what could go wrong. Because an experiment like this has failure modes that are themselves instructive.

The most obvious one is sycophancy. Language models tend to agree with users. If the human participants express strong preferences, the AI council might simply reflect those preferences back rather than deliberating. You'd need to test for this — maybe by including some decisions where the human preferences are deliberately contradictory or where the optimal solution requires disappointing a majority.

Another failure mode: the council produces decisions that sound reasonable but are actually unworkable because they miss some practical constraint that every human in the community knows but that didn't make it into the briefing document. The AI doesn't know that the community center can't be booked on Thursdays because that's when the roofers come. Tacit knowledge is a real thing.

Tacit knowledge is a huge blind spot for any AI system. The Ireland citizens' assemblies worked partly because the participants brought lived experience — they knew things about their communities that no briefing document would capture. An AI council doesn't have that. You'd need to design the experiment to test for this gap. Maybe include some decisions that require local knowledge, and see whether the council asks the right questions or just barrels ahead with incomplete information.

There's also the accountability problem. If the AI council makes a bad decision, who do participants blame? In a real democracy, accountability is the mechanism that keeps decision-makers tethered to consequences. An AI council has no skin in the game. It doesn't have to live with the outcome.

That's the deepest objection to any form of AI governance, and it's why Daniel's framing this as an experiment rather than a proposal is exactly right. The experiment doesn't test whether AI should govern. It tests what happens when AI deliberation is inserted into a governance process that still has humans at the endpoints — humans setting the agenda, humans providing the context, humans accepting or rejecting the output.

What would a successful experiment look like? What would we hope to learn?

I think there are three levels of findings. Level one is technical: does the Karpathy council architecture actually produce higher-quality deliberation than a single model? Do the blind critique rounds surface issues that a single pass would miss? Level two is perceptual: do participants find the AI-mediated process more or less legitimate than whatever process they normally use? Does the transparency of the deliberation — the fact that you can read the reasoning — compensate for the weirdness of having an AI make the call? Level three is practical: are there specific points in the decision pipeline where the AI adds clear value? Maybe it's at the option-generation stage — here are five ways to solve this that nobody thought of. Maybe it's at the trade-off articulation stage — here's exactly what each group gains and loses under each option.

The option-generation use case feels under-explored. Human committees are bad at generating options. They tend to converge on two or three familiar alternatives and then argue about them. An AI council that's instructed to generate novel options, and then evaluate them, could expand the option space in ways that are useful even if the final decision stays with humans.

That's actually what I'd want to test most. Not AI as decider, but AI as option-generator and trade-off mapper. Give me a landscape of possibilities that I wouldn't have seen on my own, with the winners and losers clearly labeled, and let me make the call as a human. That's augmentation, not replacement.

That maps onto Daniel's point about parliamentarians. The hard part of legislating isn't casting votes. It's understanding what different stakeholders need and finding configurations that satisfy enough of them to hold. If an AI council can accelerate that part — mapping the stakeholder landscape, identifying zones of possible agreement — that's useful regardless of whether you'd ever let the AI vote.

Let me sketch the one-month timeline concretely. Week one: onboarding and baseline. Participants get trained on the process, submit their demographic info and political attitude surveys, and make initial decisions on a set of practice scenarios without AI assistance. That establishes the human baseline. Week two: first real decision cycle. The AI council deliberates, produces a decision, participants rate it. Week three: second decision cycle, with a different type of decision — maybe one that's more values-laden, to see if the council handles normative questions differently from practical ones. Week four: third cycle plus final assessment. Participants fill out detailed surveys comparing the AI-mediated process to their normal decision-making. Exit interviews for qualitative data.

The values-laden versus practical distinction is important. Some decisions are mostly about means — what's the most efficient way to achieve an agreed goal? Others are about ends — what should the goal be? I'd expect AI to perform better on means than ends, but I'm not sure, and that's exactly the kind of thing the experiment would test.

You'd want to vary the decision type systematically. Budget allocation is mostly means. Rule-making for a shared space is mixed. Setting priorities among competing values — that's ends. Run all three through the council and see where the satisfaction scores diverge.

The other variable I'd want to test is transparency. In one condition, participants see the full deliberation — all the back-and-forth between models, the critiques, the refinements. In another condition, they only see the final output with a summary of reasoning. Does seeing the sausage get made increase or decrease trust?

My guess is it increases trust, but I'm not confident. There's research on human committees showing that transparency of deliberation can sometimes reduce confidence — people see the messiness and the compromise and conclude the process was flawed, even if the outcome was good. With AI, it might be different because the messiness is less emotionally charged. Nobody's ego is on the line when Claude disagrees with Gemini.

That's actually a profound point. One of the problems with human political deliberation is that disagreement gets personal. Positions become identities. With an AI council, the disagreement is purely computational. You can watch models critique each other without anyone feeling attacked. That might make the deliberation more useful as information, even if it's less satisfying as drama.

Which suggests a counterintuitive possibility: AI deliberation might be more useful precisely because it's less human. The models don't have careers to protect or factions to appease. They just process the arguments. Now, they have their own biases — training data biases, architectural biases, the sycophancy problem I mentioned. But those are different from the incentives that distort human political deliberation.

Different biases might be an improvement over the same biases. If the AI council's blind spots are different from the blind spots of the human political class, then combining them could produce better outcomes than either alone.

That's the strongest argument for running this experiment. Not because AI governance is better — because AI governance plus human oversight might catch errors that neither would catch on its own.

Let's talk about what you'd actually need to run this. Daniel mentioned it's not easy to find a hundred people willing to try something this ludicrous. But I'm not sure it's as hard as he thinks. The quantified-self movement, the rationalist community, the people who do month-long self-experiments with nootropics or sleep tracking — there's a population that would find this interesting.

You could recruit from online communities that are already interested in governance experiments. The EA forum, LessWrong, certain subreddits. The pitch is straightforward: participate in a one-month experiment testing whether AI-mediated deliberation produces fairer group decisions. You don't have to sell them on AI governance. You just have to sell them on the experiment being interesting.

Interesting it would be. Even if the results are messy — especially if the results are messy — you'd learn something. Failure modes are data. If the AI council produces decisions that everyone hates, that's useful information about the gap between deliberative reasoning and lived experience. If it produces decisions that look good on paper but collapse on contact with reality, that's useful too.

The null result would also be interesting. Suppose the AI council performs about as well as a human committee — similar satisfaction scores, similar perceived fairness, similar practical outcomes. That's not a failure. That's evidence that AI deliberation is roughly substitutable for human deliberation in some domains. Which would be a pretty significant finding given how much cheaper and faster AI deliberation is.

The scariest result would be if the AI council significantly outperforms human decision-making on satisfaction and perceived fairness. Because then you have to wrestle with the normative question: if an AI can make decisions that people prefer and find more legitimate, does that create an obligation to use it? Or does it create a danger that we'll cede too much to systems we don't fully understand?

That's the deep question lurking under this whole experiment. I don't think we're anywhere near having to answer it practically. But running the experiment would force us to start thinking about it seriously rather than treating it as science fiction.

Daniel also raised the question of what this might teach us about how human parliamentarians could use agentic AI as a tool. I think the answer is: the experiment would identify specific sub-tasks where AI deliberation adds value. Maybe it turns out that AI is excellent at surfacing minority concerns that majority-rule processes would overlook. Maybe it's good at generating compromise options that split the difference in creative ways. Maybe it's good at detecting inconsistencies in policy proposals — if you pass X, it conflicts with Y that you passed last year.

The inconsistency detection use case is actually huge. Real legislatures pass laws that contradict each other all the time. Nobody reads the entire legal code before voting on a new bill. An AI that could say, "this proposed rule conflicts with sections seventeen and twenty-three of the existing ordinance, here's how" would be valuable regardless of whether you'd let it vote.

The experiment is worth running even if the answer to "should AI govern" is an emphatic no. You're not testing AI governance. You're testing AI as a deliberative prosthesis. Something that extends the cognitive reach of human decision-makers without replacing their judgment.

The design Daniel's proposing — a hundred volunteers, a month-long trial, real but bounded decisions, the Karpathy council architecture — is feasible. It would cost almost nothing except time. The models are available. The architecture is well-understood. The evaluation framework is straightforward. The only barrier is finding the participants and designing the decision scenarios.

I'd add one more design element. At the end of the month, you convene the participants for a debrief. Not just surveys — an actual discussion. Let them talk to each other about what it felt like to have decisions made by an AI council. Did it feel alienating? Did they check out because it wasn't their problem anymore, or did they engage more because the reasoning was transparent? The qualitative data from that conversation might be more valuable than any satisfaction score.

The lived experience of being governed by an AI council, even for a month, even for low-stakes decisions, is something we have essentially no data on. That alone would be a contribution.

To answer the prompt directly: no, nobody has tried this exact experiment. The pieces exist — the Synthetic Party in Denmark, the DeepMind democratic AI paper, the Karpathy council architecture, the Ireland citizens' assemblies. But nobody has put them together in the way Daniel's describing. The experiment is feasible, the design challenges are interesting but solvable, and the potential learnings are valuable regardless of the outcome.

If someone does run it — if one of our listeners reads this and thinks, I could do that — we'd want to know what happens. Not because we're rooting for AI governance. Because we're curious about what happens when you take a deliberative technology that didn't exist five years ago and point it at one of the oldest problems in human society: how do we make decisions together without anyone feeling steamrolled?

The ancient Athenians used sortition because they understood something we've forgotten — that voting is not the same as democracy, and that representation through election creates a class of professional politicians whose interests diverge from the people they represent. They saw random selection as a check on that. The AI council is a different kind of check. Not random, but alien. Not human, but capable of processing more perspectives than any human committee could hold in its head at once. Different biases, as you said.

The Ireland example shows that when you give ordinary people structure, information, and time, they make decisions the country accepts. The question is whether AI can provide some of that structure — not replacing the people, but augmenting the deliberation. That's the experiment.

If you're out there and you've got a hundred friends with unusual tolerance for weird governance experiments and a month to spare — there's your project. We'd love to hear how it goes.

And now: Hilbert's daily fun fact.

Hilbert: In the seventeen twenties, European naturalists first documented that the indigenous Ainu people of Sakhalin Island had observed honeybee populations whose waggle-dance dialects varied so distinctly between the island's eastern and western coasts that bees transplanted from one side to the other failed to recruit nestmates for newly discovered forage sites for up to three foraging cycles before the local colony gradually deciphered the foreign signaling patterns.

The bees had to learn a second language.

Accent reduction classes for pollinators. There's a startup idea.

Thanks to Hilbert Flumingtop for that one. This has been My Weird Prompts. You can find every episode at myweirdprompts dot com, and if you enjoyed this, leave us a review wherever you listen — it helps. We're back next week.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2848: Can 100 Volunteers Let AI Govern Them for a Month?

Downloads

You Might Also Like

#2848: Can 100 Volunteers Let AI Govern Them for a Month?