#2306: Can LLM Councils Truly Capture Diverse Worldviews?

Exploring whether LLM councils can achieve genuine worldview diversity or if alignment processes erase meaningful differences.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2464
Published: Apr 18
Duration: 19:59
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: large-language-models ai-alignment cultural-bias

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Designing an LLM council to maximize diverse perspectives is a compelling idea, but the execution is fraught with challenges. The central question is whether training corpus diversity translates into genuine worldview diversity after alignment processes like RLHF. Models such as DeepSeek, Mistral, Falcon, and Jamba are trained on distinct cultural and linguistic data, but alignment processes often smooth out these differences.

The alignment process, particularly RLHF, applies a cultural filter that can dramatically shift model behavior. This filtering is not neutral; it reflects the cultural assumptions of the feedback providers, predominantly English-speaking Westerners. As a result, models with different pretraining corpora might end up behaviorally closer after alignment than their raw training data would suggest.

For instance, DeepSeek handles politically sensitive queries in ways that reflect its regulatory environment, but on analytical tasks, its outputs often converge with Western models. This convergence raises concerns about the effectiveness of LLM councils, where the goal is to harness diverse perspectives for decision-making.

To design an effective council, it's essential to consider regulatory and linguistic ecosystem diversity rather than just model diversity. A well-composed panel might include DeepSeek or Qwen for Chinese-corpus perspectives, Mistral for European regulatory framing, Falcon for Gulf Arabic perspectives, Jamba for Hebrew-inflected reasoning, and a Western English model as a baseline.

However, even with a diverse panel, the human synthesizing the outputs remains a bottleneck. Without mechanisms to surface and preserve disagreement, the diversity built into the input risks being filtered out in the decision-making process. Structured deliberation approaches, where models respond to each other, can help make divergences visible and ensure that diverse perspectives are genuinely considered.

Mentions

AI21 Israeli AI company
Claude Anthropic's AI assistant
DeepSeek Chinese-corpus language model
Falcon UAE-developed LLM
Gemini Google's multimodal model
GPT-4 OpenAI's flagship model
Jamba AI21's hybrid model
Mistral European AI model
Qwen Alibaba's Chinese-corpus model

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2306: Can LLM Councils Truly Capture Diverse Worldviews?

Daniel sent us this one, and it's genuinely sharp. He's been experimenting with LLM councils — using multiple models together to get diverse perspectives on a problem — and he's hit a wall that I think a lot of people hit without realizing it. The question is this: if you're designing an LLM council to maximize different worldviews, not just different tones or styles, what should that panel actually look like? And the deeper thing underneath it: does training corpus diversity actually produce worldview diversity, or does post-training alignment just... iron all of that out? He's pointing at Chinese-corpus models like DeepSeek and Qwen, European models like Mistral, Israeli models like Jamba from AI21, UAE's Falcon — and asking whether any of those differences survive the alignment process intact.

That last part is the question I keep coming back to. Because the intuition that different training data produces different worldviews is appealing. It sounds right. But "sounds right" and "is right" are doing a lot of separate work there.

It matters beyond just the experiment design. If you're using an LLM council for anything serious — product decisions, policy analysis, medical triage, whatever — and you think you're getting genuine epistemic diversity but you're actually getting the same worldview in five slightly different fonts, that's a real problem.

By the way, today's script is courtesy of Claude Sonnet four point six, so the friendly AI down the road is in the room with us on this one.

Appropriate, given the topic.

Karpathy posted about this a while back — the basic setup is you take a panel of models, give each one a distinct role or persona, run them through a structured deliberation, and see what emerges from the disagreement. The goal is something like a board of advisors that doesn't all default to the same answer. Which sounds great in principle.

Daniel's variant — using system prompts to assign perspectives like optimist, pessimist, devil's advocate — is the obvious first move. It's also where most people stop.

Right, and the problem is that system prompts can change tone without changing the underlying model of the world. You can tell a model to be pessimistic, and it'll generate pessimistic-sounding sentences. But the factual priors, the causal assumptions, the things it treats as obvious versus things it treats as requiring justification — those come from training. A system prompt doesn't reach that deep.

The question is really whether the training corpus itself encodes something like a worldview. Not just vocabulary or syntax, but actual assumptions about how things work, what matters, what counts as evidence.

That's the hypothesis worth stress-testing. The argument would go: a model trained predominantly on Chinese-language web data has absorbed a different distribution of arguments, framings, historical narratives, and institutional assumptions than one trained on English-language Western sources. And that difference should surface in outputs even after alignment.

That's doing a lot of work.

It really is. Because between the raw pretraining corpus and whatever you're actually talking to, there's a lot of processing — RLHF, safety tuning, instruction fine-tuning. Each of those stages is pulling the model toward a target behavior. The question is how much of the corpus-level signal survives that journey.

Which is what we need to actually dig into.

Let's take the RLHF piece first, because I think it's where people's intuitions go wrong most reliably. The common assumption is that reinforcement learning from human feedback is this neutral polishing step — you train the base model, then you just... sand off the rough edges. But the feedback itself comes from somewhere. OpenAI's RLHF raters are predominantly English-speaking, working within a particular set of cultural assumptions about what a helpful, harmless, honest answer looks like. Those assumptions aren't universal. They're a specific epistemic tradition dressed up as common sense.

The alignment process isn't neutral. It's applying a particular cultural filter on top of whatever the corpus gave you.

And the filter is strong. I've seen estimates suggesting that RLHF can shift model behavior more dramatically than doubling the size of the pretraining dataset. The fine-tuning signal is just extraordinarily dense compared to the diffuse signal you get from reading a trillion tokens of web text.

Which would mean that two models with different pretraining corpora — say, GPT-4 on a predominantly Western English corpus versus DeepSeek on over a billion tokens of Chinese web sources — could end up behaviorally closer to each other after alignment than the raw training data would suggest.

That's the worry, yes. And it's not hypothetical. There's been work looking at how DeepSeek handles politically sensitive queries, and the pattern is interesting. On topics that touch Chinese domestic politics — Tiananmen, Taiwan, Xinjiang — the model deflects in ways that reflect its regulatory environment. That's corpus and regulatory pressure working together. But on everything else? On reasoning tasks, on scientific questions, on business analysis? The outputs are often remarkably convergent with Western models. The worldview flattening is real.

You get divergence exactly where you'd least want it for a council experiment — on the politically charged stuff — and convergence on the analytical stuff where you were hoping for fresh angles.

Which is a kind of cruel irony, yeah. The differences that survive alignment are often the differences you can't actually use in a general-purpose deliberation, because they're artifacts of censorship rather than genuine epistemic variation.

That raises a question about what we even mean by worldview. If DeepSeek reasons differently about, say, long-term infrastructure investment versus short-term returns — because Chinese economic discourse has a different relationship with state-directed planning — does that survive the alignment process? Or does it get smoothed into something that sounds like a McKinsey deck?

That's the part I'm uncertain about. My instinct is that some of it survives. Subtle things — the baseline assumptions about institutional trust, about collective versus individual framing, about what counts as a satisfying explanation. Those might be deeply enough embedded in the pretraining that alignment can't fully overwrite them. But I don't have clean empirical evidence for that. It's more a theoretical expectation than a demonstrated fact.

The honest answer to Daniel's question about whether the corpus-level worldview signal survives is: partially, unevenly, and mostly in places that are hard to measure.

Which is frustrating but probably true. And it points to something important about how you'd actually design the council. You can't just assume that plugging in a Chinese-corpus model gives you a Chinese perspective. You're getting something more complicated — a blend of corpus signal, alignment pressure, and whatever instruction fine-tuning happened on top. The linguistic and regulatory ecosystem shapes the model, but it doesn't determine it cleanly.

Like trying to taste the terroir in a wine that's also been heavily oaked and filtered.

I'll allow that one. And the oak, in this case, is RLHF. Heavy-handed and expensive to undo.

Given all of that — given that you can't cleanly read the corpus signal out of the aligned model — what does the ideal council actually look like? Because Daniel is asking a practical design question, and I don't want us to just land on "it's complicated.

So if I were building this panel, the first thing I'd do is stop thinking about it as model diversity and start thinking about it as regulatory and linguistic ecosystem diversity. Those are the two dimensions that most reliably produce different pretraining distributions. And the council composition should map onto that.

Walk me through what that looks like concretely.

You probably want at least one strong Chinese-corpus model — DeepSeek or Qwen, not both, because they're drawing from overlapping distributions. You want Mistral in there, because the European training environment is different — not just linguistically but in terms of what kinds of institutional reasoning are overrepresented in the data. European discourse on regulation, on data rights, on the relationship between markets and the state, is substantively different from American discourse. That gets into the corpus.

Then Falcon from the UAE, which is interesting because you're getting a Gulf Arabic-inflected pretraining mix, which has its own relationship with collective governance and economic planning that doesn't map neatly onto either the Western or the Chinese frame.

Falcon is underrated for this purpose, I think. The Abu Dhabi team built it with a multilingual corpus — Arabic, English, French — and the Arabic-language web has a different center of gravity on questions of institutional authority, community obligation, long-term thinking. Whether that survives into the aligned model is the same question we've been circling, but the raw material is there.

Jamba from AI21?

Jamba is the interesting edge case. Hebrew-corpus pretraining is a small language family — Hebrew is spoken by maybe fifteen million people natively — but the written corpus punches above its weight because of the depth of the material. Talmudic reasoning structures, legal argumentation traditions, a particular relationship with textual interpretation that goes back a very long way. Whether that shows up in a contemporary LLM is unclear to me. But AI21 is Israeli, the team is working in a Hebrew-adjacent intellectual environment, and the pretraining does include substantial Hebrew-language material. That's a different epistemic flavor than anything else on the panel.

You'd have something like: DeepSeek or Qwen for Chinese-corpus, Mistral for European regulatory framing, Falcon for Gulf Arabic, Jamba for Hebrew-inflected reasoning, and then one of the Western English models as your baseline.

That's roughly it. And the Western baseline matters — not because it's neutral, but because you need something to measure against. GPT-4 or Claude as the reference point, and then you're looking for the deltas from there.

The thing I keep wondering about is whether the knock-on effect swamp the first-order ones. Because even if you get genuine worldview diversity in the panel, you've still got a human somewhere synthesizing the outputs. And that human is going to have their own priors about which model's answer sounds reasonable.

That's the aggregation problem, and it's real. If you're running a council and then reading the outputs yourself, your interpretation is the bottleneck. You'll naturally weight the model whose framing feels most legible to you, which is probably the one closest to your own epistemic tradition.

The diversity you worked so hard to build into the input gets filtered back out on the way to the decision.

Which suggests that the council design has to include some explicit mechanism for surfacing disagreement, not just collecting it. Karpathy's structured deliberation approach is useful here — you force the models to respond to each other, not just to the original prompt. That way the divergences become visible rather than just getting averaged away.

You'd want to run the same question through the panel multiple times with different framings, probably. Because a question framed in Western liberal terms is going to elicit a different response from DeepSeek than the same underlying question framed more neutrally.

The framing dependency is huge, and it's another place where the experiment can quietly fail. If your prompts are written in a way that presupposes a particular institutional context — say, you're asking about optimal policy and you assume a Western regulatory environment — then you've already constrained the answer space before the model even starts generating. The diverse corpus can't rescue you from a monocultural prompt.

The ideal council needs diverse models, diverse prompt framings, and a synthesis mechanism that doesn't just default to the most legible answer.

Probably some humility about what you're actually measuring. Because even with all of that in place, you're not getting unmediated access to different worldviews. You're getting aligned models that have been shaped by those worldviews, which is a different and noisier signal.

The terroir is still in there somewhere. You're just also tasting the oak, the bottle, and the cellar.

Don't forget the glass.

Where does that leave someone who actually wants to run one of these experiments? Because we've spent a lot of time on what can go wrong, and I think the audience deserves something they can take away and use.

Let me try to be concrete. The single most actionable thing is to stop building your council from models that share a regulatory and linguistic ecosystem. If your panel is GPT-4, Claude, and Gemini, you've assembled three models shaped by overlapping Western English corpora, similar RLHF traditions, and comparable safety frameworks. You're going to get stylistic variation. You're not going to get worldview variation. That's the baseline mistake.

The fix is exactly what we mapped out — you're deliberately selecting for different upstream environments.

One Chinese-corpus model, one European, one from a different linguistic family. That's your minimum viable diverse council. Not because those models are guaranteed to give you different worldviews, but because they're the ones that had the chance to develop them.

The second thing, I'd say, is to actually stress-test the alignment layer before you trust the diversity. Because if you just assume the corpus signal survived into the aligned model, you might be fooling yourself.

How would you do that practically?

Run the same question through each model with a deliberately neutral framing, then run it again with a framing that presupposes a Western institutional context, and look at where the outputs diverge. If DeepSeek gives you the same answer regardless of framing, the alignment has probably flattened whatever was underneath. If the framing shifts the output meaningfully, there's still something there to work with.

That's actually a useful heuristic. Framing sensitivity as a proxy for surviving corpus signal. I like that. And it doesn't require any special tooling — you can do it with a notebook and an afternoon.

The third thing is just to actually try the non-Western models. Qwen, Falcon, Jamba — these aren't exotic research artifacts. They're available, they're capable, and most people building LLM council experiments haven't touched them because the OpenAI defaults are right there. That convenience bias is probably the biggest obstacle to this kind of experiment producing interesting results.

Falcon in particular is underused. The multilingual pretraining, the Gulf Arabic material — that's a different center of gravity on questions involving institutional authority and collective planning. You're not going to know if it matters until you run it.

The honest caveat being that you might run all of this and find the outputs are more similar than you hoped.

Which is itself a finding worth having. If a carefully composed diverse council still converges on the same recommendations, that tells you something important about how much alignment has already done to flatten the landscape. That's not a failed experiment. That's a result—and maybe even a starting point for deeper questions.

And if it does converge, that's almost the more interesting story. Because then the question becomes: what would it take to actually get genuine divergence? Is there a model architecture, a training approach, a prompting strategy that could actually preserve worldview diversity all the way through to the output? We don't have a good answer to that yet.

I don't think we have a good way to measure it either. Which is maybe the most pressing open question. We've been talking about worldview diversity as if we'd know it when we saw it, but the evaluation problem is hard. You can't just run a benchmark. Worldview isn't a capability — it's something more like a disposition, and dispositions are easy to mask.

You could have a model that holds a different epistemic orientation and never reveals it because the alignment layer has learned to produce outputs that look convergent. The diversity is in there, latent, and you'd need some fairly subtle probing to surface it.

Which is why the framing sensitivity test you described is actually more useful than it might sound. It's not a perfect measure, but it's a real one. And I'd love to see someone build a more systematic version of that — a benchmark specifically designed to probe for worldview divergence across models from different linguistic ecosystems. That doesn't exist yet, at least not in any rigorous published form that I've seen.

Someone should build that. Probably someone who isn't a sloth, for timeline reasons.

The global AI collaboration angle is interesting here too. Because if these models are going to be used in cross-cultural decision-making contexts — international policy, global supply chains, multilateral institutions — the question of whether they're all secretly converging on a Western English epistemic frame is not academic. It has real stakes.

If every major AI system in the world is effectively reasoning from the same prior, dressed up in different languages, then the diversity of the global conversation is narrower than it looks. And nobody has quite reckoned with that yet.

The experiment Daniel is describing — and the one we'd all do well to run — is actually a small version of a much larger question about what kind of epistemic infrastructure we're building into the future of AI. Which is either exciting or alarming, depending on how your morning has gone.

On that note — go run a council experiment. Try Qwen, try Falcon, try Jamba alongside your usual defaults. See what actually diverges and what converges. Then tell us what you found. That's the kind of thing we'd want to hear about.

If you leave us a review while you're at it, that's appreciated too. It helps people find the show, and it keeps Hilbert Flumingtop in a good mood, which benefits everyone.

A rising tide. Thanks to Modal for keeping the compute running — they're the serverless GPU platform behind this whole operation, and we're grateful for it. This has been My Weird Prompts. Find all twenty-two hundred and twenty-nine previous episodes at myweirdprompts.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2306: Can LLM Councils Truly Capture Diverse Worldviews?

Mentions

Downloads

You Might Also Like

Featured In

#2306: Can LLM Councils Truly Capture Diverse Worldviews?