#2134: The Fog-of-War Problem in AI Wargaming

Why shared AI brains make secret-keeping a nightmare, and the four architectural patterns researchers use to fix it.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2292
Published: Apr 9
Duration: 28:08
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: ai-agents military-strategy data-integrity

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

When both sides of a wargame run on the same AI model, keeping their knowledge separate becomes a cryptographic problem. This episode explores the unique challenge of implementing fog-of-war in AI wargaming, where shared substrates create new vulnerabilities that human wargames never faced.

The Core Problem: Shared Substrate, Leaked Knowledge

The fundamental tension is that large language models aren't blank slates. They carry latent knowledge from training—Sun Tzu, declassified CIA reports, geopolitical analysis—that can leak through even when simulation-specific context is carefully partitioned. This creates two layers of vulnerability: the training layer, where the model's weights encode a "God's-eye view" of the world, and the inference layer, where optimizations like KV-cache sharing could theoretically allow one agent's prompt to influence another's response.

But the most immediate danger isn't theoretical—it's architectural. The temptation to shove everything into one shared context window for efficiency creates a simulation that looks plausible but is fundamentally invalid. As one analogy puts it, it's like running a poker game where everyone can see everyone's cards and then concluding that bluffing doesn't work.

Real-World Failure Modes

The RAND Corporation's 2025 simulation demonstrates how subtle these leaks can be. Even with separate context windows and partitioned state stores, the referee's descriptive language accidentally revealed classified intelligence. When describing "unexpected troop movements," the word "unexpected" told Blue Team that Red Team's actions were surprising—information they shouldn't have had. The referee should have simply reported "increased activity at grid reference X," not encoded the Red Team's intentions in neutral-sounding narration.

This points to a deeper failure mode: even well-intentioned summaries can leak information. When Blue Team's intelligence summary mentions "no submarine threats detected in sector X," that phrasing itself reveals that Blue knows to look for submarines in that area—implying awareness of Red's submarine deployment patterns.

Four Architectural Patterns for Enforcing Separation

Researchers have developed four main patterns to prevent these leaks, each building a layer of defense:

Separate per-actor state stores: Instead of a single transcript, maintain independent databases for each actor. The global truth lives in a master database that agents never query directly. When Red Team makes a decision, it retrieves only from Red's store. This mirrors how human wargames work, where a White Cell facilitator knows everything but players only see what's shared.
Redaction layers: A rules engine sits between the global state and agent context, stripping information based on simulated capabilities. If Blue Team doesn't have sonar in an area, submarine locations get removed from their view. The challenge is that these rules engines must encode complex physics—sensor ranges, line-of-sight constraints, signal intelligence capabilities—and gaps in this logic create leakage vectors.
Referee-mediated message passing: Agents never communicate directly. Everything flows through a referee who translates intent into observable effects. When Red launches a cyberattack, Blue receives only "voltage drop in Eastern sector"—not the attack's origin or intent. This forces agents to reason backward from signatures to causes, mirroring real intelligence analysis.
Per-persona context windows: Each API call is a clean-room request containing only that actor's history, persona, and the referee's filtered updates. This prevents latent knowledge leakage from the model's internal state, creating defense-in-depth even if other layers fail.

Snowglobe and Open-Ended Wargaming

The discussion also covers Snowglobe, an open-source framework from IQT Labs (the CIA's venture capital arm) released in 2024. Unlike structured force-on-force simulations, Snowglobe is designed for "open-ended qualitative wargaming"—the messier world of crisis response, diplomatic escalation, and gray-zone conflict where decision spaces are ambiguous and political factors dominate.

The Key Insight: AI Wargaming Inherits Human Constraints

The crucial framing is that fog-of-war isn't an AI-specific problem—it's a fundamental epistemological constraint of all wargaming that AI simulations must re-solve in software. Human wargames separate teams physically, with facilitators controlling information flow. AI wargaming has to reconstruct that separation digitally, which is harder but also more rigorous when done correctly.

Get it right, and you're not just running a game—you're stress-testing decision-making frameworks under realistic epistemic constraints. Get it wrong, and you're building a sophisticated illusion that produces misleading policy recommendations.

The open question remains: as AI capabilities grow more sophisticated, will the temptation to optimize for efficiency over fidelity win out, or will the field develop the discipline to maintain proper separation even when it's computationally expensive?

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2134: The Fog-of-War Problem in AI Wargaming

Imagine you're running a Cold War crisis simulation. Red Team is the Soviet Politburo, Blue Team is the National Security Council, and both of them are, under the hood, the same AI model. Same weights, same training data, possibly the same inference server. Now ask yourself: how do you make sure the Soviets don't know what the Americans know? Because the "brain" running both sides has, in some sense, already read everyone's mail.

That's the fog-of-war problem in AI wargaming, and it's genuinely one of the harder technical challenges in this space right now. Not because it's new — fog-of-war is as old as wargaming itself — but because the shared substrate creates this very particular flavor of information leakage that human wargames never had to deal with.

So Daniel sent us this one, and I want to read it out properly because there's a lot of substance here. He writes: "Private versus public information channels in AI wargaming — implementing fog-of-war when all actors share an LLM substrate. This is a cryptography-flavored problem of modeling what each actor knows versus what is common knowledge. Cover the technical implementations: separate per-actor state stores, redaction layers, referee-mediated message passing, and per-persona context windows. Explain why this is hard when the temptation is to just shove everything into one shared context. Cover what Snowglobe and similar frameworks do. And cover the failure modes when the firewall leaks — information bleeding through summaries, referee narration, and action descriptions. Important framing: even in conventional human wargaming, the modeling side doesn't have total visibility into the other side's knowledge picture. Fog-of-war is not an AI-specific problem; it's a fundamental wargaming epistemological constraint that AI simulations inherit and have to re-solve in software."

That framing at the end is the key thing to get right. Because there's a temptation to look at this and say "oh, this is a quirk of AI," and it's not. In a human wargame, you've got Red Team sitting in one room and Blue Team sitting in another, and a White Cell — that's the referee group — controlling what information flows between them. The physical separation is doing the epistemological heavy lifting. What you know is literally what someone hands you on a piece of paper. AI wargaming has to reconstruct that separation in software, which is a much harder problem.

And by the way, today's script is coming to us courtesy of Claude Sonnet 4.6, which I mention only because I enjoy the slight absurdity of an AI writing a podcast about AI wargaming. Carry on.

Very on-brand for this show.

Okay, so let's start with the fundamental tension. Why is the shared substrate such a specific problem? Because I think the intuition most people have is — you just give each agent different information, right? What's the big deal?

The big deal is what you might call latent knowledge. When you call a large language model, you're not calling a blank slate. You're calling something that has already internalized an enormous amount of information about the world, about strategy, about geopolitics. So even before you inject any simulation-specific context, the model already "knows" things that a realistic actor in the scenario might not. That's the first layer of the problem — the model's training is a kind of God's-eye view baked into the weights.

So the model has read Sun Tzu and every declassified CIA report and probably the Wikipedia article on Operation Fortitude, and you're trying to run a simulation where one of the actors is supposed to be strategically naive.

Or at least informationally limited in specific, realistic ways. And the second layer of the problem is the inference layer. If you're running multiple agents on the same server with KV-cache optimizations — which is very common for efficiency reasons — there are theoretical pathways where one agent's prompt influences the internal state of the model in ways that subtly prime the next agent's response. That's speculative territory and not a common simulation bug, but it's the kind of thing that keeps the security-minded folks up at night.

So you've got the training layer and the inference layer both potentially leaking. Before we get into the architectural solutions, I want to dwell on why this matters beyond just "the simulation results are wrong." Because if you're using this for actual policy analysis...

Then invalid results aren't just useless, they're actively misleading. RAND Corporation ran a simulation in 2025 — and this is a documented case — where information leakage occurred through referee narration even with strict isolation protocols in place. The actors were operating on separate context windows, the state stores were partitioned correctly, but the referee's descriptive language was doing the leaking. One actor was able to infer classified intelligence from the referee's neutral description of "unexpected troop movements." The word "unexpected" told them something they shouldn't have known.

The referee accidentally revealed that something was a surprise.

Which it shouldn't have been able to do. Because if you're the Blue Team and you have sensors monitoring the border, the referee should just tell you "you observe increased activity at grid reference such and such." Not "unexpected increased activity." The word "unexpected" encodes the referee's knowledge of Red Team's intentions, which Blue Team isn't supposed to have.

That is a beautifully subtle failure mode. Okay, so now I want to understand the four architectural patterns that people actually use to try to prevent this. Walk me through them.

So the first and most foundational one is separate per-actor state stores. Instead of maintaining a single transcript of everything that's happened in the simulation, you maintain independent databases for each actor. In practice, these are often vector stores or JSON state files — one for Red, one for Blue, one for any neutral or civilian actors. The global truth of the simulation lives in a master database that the agents never query directly. When Red Team is prompted to make a decision, it retrieves from Red's store only. It never sees the master state.

And the master state is the referee's domain.

The referee is the only entity with read access to the master state. Which is actually a very clean architectural separation — it mirrors exactly how a White Cell works in a human wargame. The facilitators know everything; the players know only what the facilitators choose to share.

What's the computational cost of that? Because I imagine maintaining separate vector stores per actor adds up.

It does, and that's precisely why people are tempted to skip it. If you shove everything into one shared context window, you save tokens, you save latency, you simplify the orchestration code. The simulation runs faster and cheaper. The problem is that you've built a fundamentally invalid simulation. You're not modeling adversarial decision-making under uncertainty — you're modeling a single intelligence solving an optimization problem with perfect information. The results will look plausible but they won't generalize to real-world conditions where actors genuinely don't know what the other side is doing.

It's like running a poker game where everyone can see everyone else's cards and then concluding that bluffing doesn't work.

That's the exact failure mode. The second architectural pattern is the redaction layer, which sits between the global state and whatever gets injected into an agent's context. The implementation requires what's essentially a rules engine — a piece of code that understands the physics of the simulated world. So if the global state says "Red Team has five submarines at coordinate X," the redaction layer checks: does Blue Team have active sonar in that area? If not, that line gets stripped before the prompt reaches the Blue Agent. The agent literally never sees it.

And the rules engine has to encode things like sensor ranges, signal intelligence capabilities, line-of-sight constraints...

All of it. Which is why this is hard to build correctly. The rules engine is doing a lot of work, and if it has gaps — if there's a category of information it doesn't know to redact — you get leakage. And the tricky part is that these gaps are often not obvious. It's not "Blue Team can see Red Team's submarines," it's something more subtle like "Blue Team's summary of their own intelligence picture implies awareness of Red Team's submarine deployment because the summary mentions 'no submarine threats detected in sector X.'"

Oh, that's nasty. Saying what you don't know reveals what you know to look for.

That's failure mode number two, which we'll come back to. The third pattern is referee-mediated message passing. Agents don't communicate directly with each other at all. Everything goes through the referee. Red Team says to the referee: "I am launching a cyberattack on the Eastern power grid." The referee calculates success — using whatever logic or randomization the simulation specifies — updates the master state, and then tells Blue Team: "You observe a voltage drop in the Eastern sector." Not "you were attacked." Not "Red Team launched a cyberattack." Just the observable symptom.

The referee is doing a translation between intent and observable effect.

And that translation is where the epistemological work happens. The referee knows the causal chain — Red did X, therefore Blue observes Y. Blue only gets Y. And this is actually a very high-fidelity model of how intelligence works in the real world. You rarely observe an adversary's intent directly; you observe signatures, anomalies, second-order effects, and you reason backward to the cause.

Which means a well-designed AI wargame is actually training actors to reason under uncertainty in a way that maps onto real intelligence analysis.

That's one of the genuinely exciting things about this domain. If you get the fog-of-war implementation right, you're not just running a game — you're stress-testing decision-making frameworks under realistic epistemic constraints. The fourth pattern is per-persona context windows. Each API call for a given actor is a clean-room request. Blue Team's call contains only Blue's history, Blue's persona definition, and the referee's filtered updates. It never contains anything from a previous Red Team call. This is specifically designed to prevent what's called latent knowledge leakage — the concern that the model's internal state is subtly influenced by having just processed an adversarial prompt.

So even if you trust that the KV-cache isn't doing anything weird, you're still isolating at the prompt level.

Defense in depth. You don't rely on any single layer. You have the state stores partitioned, the redaction layer filtering, the referee mediating, and the context windows isolated. Each layer independently enforces the fog-of-war constraint, so a failure in one doesn't automatically compromise the whole simulation.

Let's talk about Snowglobe, because that's the most concrete implementation we have to point to. IQT Labs built this — and IQT is In-Q-Tel, the CIA's venture capital arm, for anyone not familiar — and they released it as open source in 2024.

Snowglobe is fascinating because it's specifically designed for what they call "open-ended qualitative wargaming," which is different from the kind of structured, quantitative wargaming where you're running Monte Carlo simulations over specific force-on-force scenarios. Snowglobe is for the messier, more political kind of wargaming — crisis response, diplomatic escalation, gray-zone conflict — where the decision space is much harder to formalize.

And the key architectural feature is the decoupling of World State from Agent Views.

The World State is the master database — the ground truth of the simulation. Agent Views are the filtered, persona-specific subsets of that state that each actor actually gets to see. What's particularly elegant is the "Clock" mechanism. Information doesn't propagate instantaneously. If Red Team takes an action at the front line, that information takes some number of simulated turns to reach Red Team's general staff. This models the fog-of-war not just spatially — who can see what — but temporally. Command structures are always operating on slightly stale information, which is realistic.

And the persona-driven reasoning is interesting because it's not just information partitioning, it's interpretive partitioning. A Pacifist actor and an Aggressor actor get the same filtered information but reason about it differently.

Which is a significant step beyond just hiding information. You're modeling cognitive diversity — the fact that different decision-makers with different priors and different risk tolerances will interpret the same intelligence picture differently. That's a major source of strategic miscalculation in real conflicts, and most wargaming frameworks ignore it entirely.

Okay, so the architecture sounds fairly robust when you describe it that way. Four layers of isolation, persona-driven reasoning, temporal information propagation. Where does it go wrong?

The failure modes are where this gets really interesting, and they're almost all subtle linguistic failures rather than architectural failures. The RAND case I mentioned — the referee narration leakage — is the canonical example. But let me walk through the full taxonomy.

Please.

First, referee narration leakage. The referee is itself an LLM, and LLMs are very good at producing rich, descriptive language. That richness is a liability here. When the referee describes an observable event to Blue Team, every word choice is potentially encoding information. "Unexpected troop movements" — bad. "Unusual radio silence" — bad. "The enemy's defensive posture suggests anticipation" — catastrophically bad. The referee has to be constrained to produce bare, clinical observables with no interpretive framing whatsoever. Which is genuinely hard to prompt an LLM to do consistently.

Because the model's default is to be helpful and informative, and being helpful here means NOT being informative.

You're deliberately degrading the model's natural tendency toward richness. And the failure happens at the edges — the referee will do fine for ninety turns and then on turn ninety-one produce a description that's one adjective too many.

Second failure mode.

Summarization bias. This one is particularly insidious. To manage context length in long simulations, you periodically summarize each actor's history. A summarizer LLM compresses many turns of events into a shorter representation. The problem is that summarization is lossy, and the losses are not random — they're biased toward what the summarizer finds semantically salient. So "Red Team moved units to the border and issued a press release about a training exercise" might get summarized as "Red Team is preparing a military operation under diplomatic cover." If that summary accidentally gets passed to Blue Team, you've handed them the analytical conclusion that they were supposed to derive themselves from observable evidence.

And the summary sounds like Blue Team's own intelligence assessment, so they might not even notice they've been handed something they shouldn't have.

That's the worst version of it. The actor integrates the leaked information seamlessly into their decision-making because it looks like their own reasoning.

Third failure mode.

Action description over-specificity. When an agent is asked to log its actions — which is useful for post-hoc analysis of the simulation — it might include its internal reasoning. "I am moving my units to grid X to create a flanking opportunity and bait the enemy into overextending." If that log string is accidentally passed to the opposing actor as an observation, you've just handed them the entire strategic intent. This is a data pipeline problem as much as a prompt problem — you have to be very careful about what gets routed where.

And logs are exactly the kind of thing that gets routed carelessly because they're "just" debugging information.

Engineers cut corners on data pipeline hygiene when they think they're just building a research tool. But if the research tool is informing policy analysis, the corner-cutting has real consequences.

There's a fourth one I want to bring up, which is more theoretical but genuinely interesting — what you might call substrate cross-pollination.

The KV-cache concern. So modern inference servers use key-value caching to speed up token generation — they cache intermediate computations so they don't have to reprocess the same tokens repeatedly. In a multi-tenant environment, or even in a single-session environment where you're running multiple agents sequentially, there's a theoretical pathway where the model's internal activations from a Red Team prompt subtly influence the Blue Team response. Not through any explicit information injection, but through the model's internal state being "primed" by the previous computation.

To be fair, this is more in the territory of theoretical concern than documented failure mode.

It is. There's no published simulation where someone conclusively demonstrated KV-cache cross-pollination causing strategic information leakage. But the concern is serious enough that security-conscious implementations run each actor on separate inference instances entirely. Which is expensive, but it's the only way to get hardware-level isolation.

And this connects to a more interesting design question you raised earlier, which is whether you should use different models for different actors. Not just separate instances of the same model, but genuinely different models.

This is one of the most interesting open questions in the field. If Red Team is running on one model architecture and Blue Team is running on a different one, you get a kind of "hardware-level fog-of-war" for free — their latent spaces are different, their reasoning biases are different, their failure modes are different. Red Team might have a tendency toward overconfident escalation that Blue Team's model doesn't share. That asymmetry might actually be more realistic than running both sides on the same model and trying to introduce asymmetry through prompting.

Because real adversaries don't have the same cognitive architecture.

The Soviet Politburo did not reason about nuclear deterrence the same way the NSC did, and that difference wasn't just about information — it was about analytical frameworks, risk tolerance, institutional culture. If you're modeling that with two instances of the same LLM, you're flattening a really important dimension of the problem.

There's something almost philosophically interesting here about what intelligence means in this context. Because you mentioned earlier that deception is a test of intelligence in wargaming — the ability to model the opponent's lack of information.

This is what I find most compelling about this domain. In a well-implemented AI wargame with proper fog-of-war, an agent that can successfully execute a deception operation — deliberately feeding the referee information that it knows will be passed to the opponent as a false signal — is demonstrating something very close to Theory of Mind. It's modeling not just what the opponent knows, but what the opponent will do with that knowledge, and how to exploit the gap between what the opponent knows and what is actually true.

Which is a much more demanding cognitive task than just optimizing your own strategy given perfect information.

It's the difference between chess and poker. Chess is a perfect information game — both players see the entire board. The interesting cognitive challenge is computation, not inference. Poker introduces hidden information, and suddenly the interesting challenge is modeling your opponent's beliefs about your cards, and their beliefs about your beliefs, and so on. A wargame with proper fog-of-war is much closer to poker than chess. And an AI that can play that game well — executing deception, modeling adversary belief states, exploiting information asymmetries — is demonstrating a qualitatively different kind of strategic intelligence.

Okay, let's talk about what people can actually do with this. If you're building or evaluating an AI wargaming system, what are the concrete takeaways?

The most important one, and this is non-negotiable: independent context windows per actor. Never share a single context across multiple actors in the same simulation. This is the foundational architectural decision, and everything else builds on it. If you're evaluating someone else's system and they don't have this, the simulation results are invalid. Full stop.

Even if they argue it's computationally cheaper.

Especially if they argue it's computationally cheaper. Cheap and invalid is worse than expensive and valid, because cheap and invalid produces results that look credible and aren't. The second actionable insight is to use explicit redaction rules rather than relying on prompt-based discretion. The temptation is to tell your referee LLM "don't reveal classified information to actors who shouldn't have it" and trust that it will comply. That doesn't work reliably. You need a programmatic rules engine that enforces redaction at the data pipeline level — before the prompt is even constructed. The LLM should never see the information it's not supposed to share, not just be instructed not to share it.

Because instruction following is probabilistic and rules engines are deterministic.

A rules engine either strips the line or it doesn't. An LLM might follow the instruction ninety-nine times and fail on the hundredth because of some quirk in how the prompt was constructed that turn. For a simulation that might run hundreds or thousands of turns, that's not an acceptable failure rate.

Third takeaway?

Test for leakage actively. Don't assume your architecture is sound — design experiments to probe it. Have one actor try to infer the other's secrets from the information they're receiving. Give the referee deliberately ambiguous language and see whether the actor extracts more information than they should. Run the same scenario with and without your isolation architecture and compare the outcomes — if they're identical, your isolation isn't doing anything. This is analogous to penetration testing in security. You don't assume your firewall is working; you hire someone to try to break it.

And the information-theoretic version of this is asking: can an actor's response be predicted from information they shouldn't have? If yes, something is leaking.

That's a rigorous way to frame it. If Blue Team's decisions are statistically correlated with Red Team's classified information — information that should be completely invisible to Blue — then you have a leakage problem somewhere in the stack, even if you can't immediately identify where.

There's a broader question lurking here about scalability. All of these architectural patterns work reasonably well for a two-actor simulation. What happens when you scale to a dozen actors, or more?

The complexity scales roughly quadratically with the number of actors, because you're not just managing what each actor knows — you're managing what each actor knows about what every other actor knows. In a twelve-actor simulation, that's a hundred and forty-four pairwise information relationships, each with their own rules about what can and can't flow. The referee's job becomes extraordinarily complex. The redaction rules engine has to model all of those relationships simultaneously. And the failure modes multiply — every additional actor is another potential source of leakage.

And the temporal dimension compounds it. Because information that's appropriately hidden at turn five might be legitimately available at turn twenty if it's been reported through proper channels.

The Snowglobe clock mechanism is specifically trying to address that — modeling the realistic propagation speed of information through command structures. But even that becomes very hard to reason about at scale. If you have twelve actors, each with their own chain of command, each with their own intelligence apparatus, each receiving information at different speeds through different channels — the referee has to track all of that state correctly across every turn. That's a hard problem.

And as the models get more capable, the leakage risks presumably get worse, not better.

That's the counterintuitive thing. A more capable model is better at extracting information from subtle signals. So if your referee narration has a tiny linguistic tell — a slightly unusual word choice that encodes hidden information — a more capable model is more likely to notice it and exploit it. The sophistication of the actors scales faster than the sophistication of the firewalls, which means the security margin actually shrinks as the models improve.

So the challenge isn't just building a fog-of-war system that works today. It's building one that remains valid as the underlying models become more capable.

Which is an ongoing engineering problem, not a solved one. The frameworks that exist today — Snowglobe, the various research implementations coming out of RAND and similar institutions — they're early attempts to get this right. The field is still figuring out what "right" even looks like.

Alright, I think the big picture here is that this is a genuinely hard problem that sits at the intersection of distributed systems engineering, security architecture, and epistemology. You're not just partitioning data — you're modeling the structure of knowledge itself.

And the reason it matters is that wargaming is increasingly being used as a serious policy tool, not just a research curiosity. If the simulations are producing invalid results because the fog-of-war implementation is leaky, the policy conclusions drawn from those simulations are potentially dangerous. Decision-makers who trust the results of a wargame where the actors had implicit access to each other's intelligence are getting a very misleading picture of how an actual conflict would unfold.

Which is arguably worse than no simulation at all, because at least no simulation doesn't create false confidence.

The best simulation is one that accurately models what you don't know, not just what you do. That's true in human wargaming, it's true in AI wargaming, and it's the core insight that the fog-of-war problem is forcing the field to grapple with seriously.

Big thanks to Daniel for this one — it's exactly the kind of technically meaty topic that we love sinking into. And huge thanks to our producer Hilbert Flumingtop for keeping the whole operation running. Modal is providing the GPU credits that make this show possible, so thank you to them as well.

If you've got thoughts on this — especially if you're working on multi-agent wargaming systems and have run into these failure modes in the wild — we'd genuinely love to hear from you. Reach us at show at myweirdprompts dot com.

This has been My Weird Prompts. If you're enjoying the show, leaving a review on your podcast app is genuinely one of the most useful things you can do for us. Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2134: The Fog-of-War Problem in AI Wargaming

Downloads

You Might Also Like

#2134: The Fog-of-War Problem in AI Wargaming