#2131: In-Q-Tel's Open-Source Wargames

In-Q-Tel is on GitHub. Explore the IC's strategic investment arm and its use of open-source AI for wargaming.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2289
Published: Apr 9
Duration: 37:16
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: Claude Sonnet 4.6
Topics: open-source ai-agents espionage

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

In-Q-Tel on GitHub: Inside the Intelligence Community's Open-Source AI Strategy

The phrase "In-Q-Tel is on GitHub" is jarring at first glance. In-Q-Tel (IQT) is the non-profit strategic investor chartered by the CIA in 1999 to serve the broader US intelligence community — not the CIA itself, but its venture arm. And it conjures images of secret backdoors and hidden code. But in reality, it's a deliberate, strategic choice that reveals a fundamental shift in how the U.S. intelligence community (IC) operates in the 21st century. The IC is no longer building its most advanced technology in secret silos; it's experimenting in the open, leveraging the global developer community, and investing heavily in the commercial AI sector. The key to understanding this strategy is a project called Snowglobe.

Wargaming with AI Personas

Snowglobe isn't a classified program; it's a public repository. Built by IQT Labs (IQT's in-house R&D arm), it's a multi-agent system that uses large language models to simulate geopolitical wargames. But it's not a traditional simulation that outputs dry statistics like "a 40% chance of escalation." Instead, Snowglobe generates narrative. It creates the diplomatic cables, the miscommunications, and the escalating tension that defines a real-world crisis. It's less like a chess engine and more like a high-stakes improv troupe, providing the texture of a crisis, not just the odds.

In exercises run with IC partners, human participants are joined by AI personas like "The Pacifist," "The Aggressor," and "The Tactician." These aren't just simple bots; they're tuned to embody specific strategic dispositions, acting as both assistants and adversaries. However, this reveals a profound, unresolved problem. Human players begin to defer to the AI personas, particularly the Aggressor and the Pacifist, in ways that aren't always analytically justified. The AI's framing starts to dominate the room. The core question becomes: are these tools stress-testing human judgment, or are they subtly replacing it by shaping the "Overton window" of what options feel thinkable?

The Ecosystem: In-Q-Tel, IQT Labs, and IARPA

This open-source approach is enabled by a complex ecosystem of public-private partnerships. It's crucial to distinguish between two key entities:

In-Q-Tel (IQT): A non-profit strategic investor chartered by the CIA in 1999, now serving the broader IC (CIA, NSA, NGA, DIA, and adjacent DoD/DHS customers). Its job is to identify promising commercial technology with intelligence applications, invest in the companies building it, and get that tech into the hands of the IC. The model relies on "dual-use" products — tools that have a viable commercial market (like Palantir or the original Keyhole, which became Google Earth) so they don't become slow, dependent government contractors.
IQT Labs: IQT's in-house R&D arm. This is where engineers and data scientists take ideas from IQT's portfolio and stress-test them for specific intelligence use cases. They build, experiment, and prototype, often in the open, to leverage the global developer community.

This model is replicated across the IC. IARPA (Intelligence Advanced Research Projects Activity), modeled on DARPA, funds high-risk, high-payoff research. Its programs, like the recently concluded TrojAI, focus on critical security challenges such as detecting "backdoors" in neural networks — a scenario where a model behaves normally in testing but has been trained to fail under specific, adversarial conditions.

Accelerators and Geospatial Intelligence

The National Geospatial-Intelligence Agency (NGA) has its own version of this model. Declaring 2025 its "Year of AI," the NGA launched an accelerator with Capital Innovators, providing $100,000 grants to startups focused on integrating multimodal AI into geospatial intelligence. The bottleneck for intelligence isn't collecting satellite imagery anymore — it's analyzing the overwhelming flood of data. By funding a diverse set of startups, the NGA bets on finding the best commercial solutions for fusing satellite, SAR, and signals data without having to build it all in-house.

The Core Tension

The intelligence community has concluded it cannot build cutting-edge AI internally. The talent, compute, and culture required for frontier AI development reside in the commercial sector. Their strategy is to find, fund, and adapt what the market creates. But this introduces a fundamental tension: the very openness and speed that makes commercial AI powerful also creates new vulnerabilities. The IC is experimenting with how to maintain security and analytical integrity while operating on platforms and with partners that are, by design, open to the world. In-Q-Tel on GitHub isn't an anomaly; it's the new reality.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2131: In-Q-Tel's Open-Source Wargames

Here's what Daniel sent us this time. He writes: "Use IQTLabs' Snowglobe project as an entry point into a broader discussion of IQTLabs and the intelligence community's incubators, accelerators, and labs. Snowglobe is an interesting use case for using AI models in policy wargaming. Broaden from there to cover: In-Q-Tel's role as a CIA-backed venture arm, other IC incubators and accelerators and how they experiment with AI models, public-private partnerships as the mechanism for this experimentation, and how privacy and data handling work when non-security-cleared workers — contractors, open-source contributors, academic partners — are involved in these projects." So, a lot of ground to cover. The CIA is on GitHub. Let's go.

I mean, that sentence alone should stop people in their tracks. The CIA is on GitHub. The same platform where a teenager in Gdansk is pushing commits to their personal blog engine.

And that's not a bug in the system, apparently. That's the design.

That's the design. And Snowglobe is the perfect lens for understanding why, because it's not some classified black program. It's a public repository. You can go look at the code right now. IQTLabs built an open-source, large language model-powered multi-agent wargaming system and put it on the open internet, and that's not a leak — that's intentional strategy.

So let's start there. What actually is Snowglobe? Because the name is doing a lot of work and I want to make sure people understand what we're actually talking about.

Right. So Snowglobe is a multi-agent system where large language models simulate the participants in a geopolitical wargame. And I want to be precise about what makes it different from older simulation approaches, because this isn't a Monte Carlo simulation giving you probability distributions. It's not outputting "thirty-seven percent chance of escalation." The AI generates narrative. It generates the friction — the diplomatic language, the misunderstandings, the escalating back-and-forth that characterizes real crises.

So it's less like a chess engine and more like a very high-stakes improv troupe.

That's actually a decent way to put it. And if you want a concrete illustration — imagine you're running a Taiwan Strait scenario. A traditional simulation might tell you there's a forty percent probability of naval confrontation within seventy-two hours. Snowglobe instead generates the actual diplomatic cables, the miscommunications between a nervous junior attaché and a hardline general, the ambiguous statement from a foreign ministry that gets interpreted two different ways by two different teams. It's producing the texture of a crisis, not just the statistics of one.

Which is a fundamentally different kind of useful. Statistics tell you what might happen. Narrative tells you how it feels to be inside it when it's happening.

And the April 2025 joint exercise with the CIA is the clearest demonstration of what that means in practice. Six human participants, and then a suite of AI personas running alongside them — The Pacifist, The Aggressor, The Tactician. These aren't just labels. Each persona is tuned to embody a particular strategic disposition, and they function as both assistants and adversaries, subtly shaping how the human teams reason through the scenario.

Okay, that raises a flag for me immediately. If I'm a human player in that wargame and I have an AI assistant that's been tuned to be, say, cautious and de-escalatory, am I actually reasoning through the crisis or am I being nudged toward a particular conclusion?

You've put your finger on what I think is the most interesting unresolved question in the whole project. The CIA's own write-up in their Studies in Intelligence publication from December 2025 notes that human players in the trials began deferring to the AI personas — particularly the Aggressor and the Pacifist — in ways that were not always analytically justified. The AI's framing started to dominate the room.

So the wargame is supposed to stress-test human judgment, but the AI might actually be replacing it.

Or at minimum, it's shaping the Overton window of what options feel thinkable. Which is a profound problem if you're trying to use these exercises to prepare real decision-makers for real crises. You want to stress-test their reasoning, not train them to defer to a language model's priors.

And those priors come from... what? Training data? Fine-tuning choices made by engineers at IQTLabs?

Both. And this is where the organizational structure of IQTLabs matters. Because IQTLabs is not In-Q-Tel. People often use those names interchangeably and they shouldn't. In-Q-Tel was founded in 1999 as a CIA-backed non-profit venture capital firm. Its job is to find commercial technology that has intelligence applications, invest in the companies building it, and get that technology into the hands of the IC. Palantir is the canonical example — early In-Q-Tel investment, commercial product that also serves the government. Keyhole, which became Google Earth, same story.

So In-Q-Tel is writing checks.

In-Q-Tel is writing checks and taking board seats. IQTLabs is something different — it's the internal sandbox. The engineers and data scientists at IQTLabs take the commercial technology that In-Q-Tel has invested in, or just interesting open-source work, and they stress-test it for specific intelligence use cases. They build, they experiment, they prototype. And crucially, a lot of that work happens in the open, on GitHub, because they're deliberately trying to leverage the global developer community.

By the way, today's script is being generated by Claude Sonnet 4.6, which feels appropriately on-theme for an episode about AI and the intelligence community.

Very on-theme. Claude writing a script about AI wargaming tools, delivered through a pipeline that runs on Modal's GPU infrastructure. It's turtles all the way down.

Anyway. Back to the CIA's venture capital arm. In-Q-Tel made, what, forty-seven AI-related investments between 2020 and 2025?

Forty-seven documented AI investments in that window, yeah. And that number matters because it tells you something about the pace of change. The IC has concluded it cannot build cutting-edge AI internally. The talent isn't there, the compute procurement timelines are too slow, and frankly the culture of a government intelligence agency is not well-suited to the kind of fast iteration that frontier AI development requires. So the strategy is: find the commercial companies doing the best work, invest in them, and figure out how to use what they're building.

There's a real tension in that model though. Because the thing that makes a startup move fast is not having to worry about classification requirements and security clearances and ITAR restrictions. The moment you bring the IC into the picture, all of that overhead lands on the company.

Which is exactly why the "dual-use" requirement is so important to how In-Q-Tel structures its investments. To get IQT funding, your AI product needs a viable commercial market. You can't just build a tool for the CIA — you have to build a tool that also works for, say, corporate strategy consulting or supply chain risk analysis. The intelligence use case is one customer among several. That keeps the startup's incentives aligned with moving fast in the commercial market, which in turn keeps the technology current for the IC.

And it also means the startup isn't entirely dependent on government contracts, which historically has been a graveyard for innovation.

The contractor trap, yeah. You get locked into cost-plus contracting, your engineers stop doing interesting work, your best people leave. The dual-use model is designed to avoid that. But it creates its own complications, which brings us to the broader ecosystem, because In-Q-Tel and IQTLabs are not alone in this space.

Right, let's broaden out. What else is happening in this world?

So the IC has eighteen distinct agencies, and several of them have developed their own versions of this model. IARPA — the Intelligence Advanced Research Projects Activity — is probably the best known. Created in 2008, roughly a billion dollar annual budget, explicitly modeled on DARPA. Their mandate is to fund high-risk, high-payoff research that individual agencies wouldn't take on themselves. And their current AI portfolio is fascinating. They had a program called TrojAI that just wrapped up, focused on detecting backdoors in neural networks — the scenario where a model behaves normally in testing but has been trained to fail in specific adversarial conditions. The follow-on work is Endless Generative Waveforms, which is extending that into securing generative AI outputs specifically.

The backdoor problem is genuinely scary if you think about it in an intelligence context. You're deploying an AI system to assist with analysis, and the model has been subtly compromised to steer conclusions in a particular direction.

And you might not notice for years. The model passes all your red-team tests because the trigger condition is something very specific that your red team never thinks to test for. There's actually a useful analogy here from conventional security: it's structurally similar to a hardware implant in a piece of network equipment. The device functions perfectly under normal inspection, but when it sees a specific packet signature, it opens a backdoor. The AI version of that is a model that produces accurate analysis ninety-nine percent of the time, but when it encounters a specific framing of a question about, say, a particular adversary's military posture, it systematically underestimates the threat. You'd need a very targeted red-team test to catch it, and if the adversary who introduced the backdoor knows your red-team methodology, they can design around it.

Which is a deeply uncomfortable thought. IARPA's bet is that detecting this requires a dedicated research program, not just standard security review. And they're probably right.

What about the NGA? Because I know they've been pushing hard on AI.

The National Geospatial-Intelligence Agency declared 2025 their "Year of AI" — and they meant it structurally, not just rhetorically. They stood up an accelerator program run with Capital Innovators, providing hundred thousand dollar grants to startups specifically to integrate multimodal AI into geospatial intelligence work. The idea is that commercial satellite imagery is now so abundant and so high-resolution that the bottleneck isn't collection, it's analysis. You need AI that can look at ten thousand square kilometers of imagery and tell you what changed overnight.

And multimodal matters there because you're not just looking at visible-light photography. You've got SAR, you've got multispectral, you've got signals data overlaid on geographic coordinates.

Right, and the models that can fuse those different data types are coming out of commercial computer vision research, not government labs. So the NGA's accelerator model is: find the startups doing the best multimodal work, give them a small grant and a problem to solve, and see what comes out. It's low-cost experimentation at scale.

How many startups are we talking about in a given cohort? Is this a handful of companies or is it broader than that?

The Capital Innovators cohorts have been running in the range of eight to twelve companies at a time. Small enough that the NGA program managers can actually engage meaningfully with each team, large enough to get real diversity of approaches. The bet is that you don't know in advance which approach to multimodal fusion is going to work best for intelligence imagery, so you fund several and see what survives contact with the actual problem.

Then there's the NSA's AI Security Center, which is a different flavor entirely.

Very different. The NSA's AI Security Center — the AISC — is less about building AI capabilities and more about hardening them. Their focus is adversarial robustness: protecting AI models used in signals intelligence from poisoning attacks, prompt injection, model extraction. When you're running AI on signals intelligence data, the adversary isn't just trying to intercept your communications — they're potentially trying to corrupt the model that's analyzing those communications.

So while IQTLabs is asking "how do we use AI to understand geopolitical scenarios," the NSA is asking "how do we make sure our AI hasn't been turned against us."

And both questions are necessary. The Defense Innovation Unit is doing something that sits closer to the Snowglobe end of things. Their Thunderforge project, announced in March 2025, partnered with Scale AI to use agentic AI to critique and automate theater-level war plans. Which is a sentence that should give everyone pause for a moment.

"Automate theater-level war plans" is doing a lot of work in that sentence.

It is. The framing from DIU is that it's augmenting human planners, not replacing them — the AI critiques the plan, identifies gaps, stress-tests assumptions. But the direction of travel is clear. And Scale AI's involvement is interesting because Scale has built its business on labeled data for AI training, including a significant amount of work for the IC on computer vision models. So they're showing up in multiple parts of this ecosystem simultaneously.

Which brings us to the public-private partnership mechanics, because I think most people don't have a clear picture of how these relationships are actually structured legally and technically.

The two main instruments are CRADAs — Cooperative Research and Development Agreements — and OTAs, which are Other Transaction Authorities. CRADAs are the older model, developed for the national labs. They allow a government agency and a private company to share resources, personnel, and IP without going through standard procurement. The government brings its data and its problem; the company brings its technology and its engineers. IP rights are negotiated case by case.

And OTAs?

OTAs are more flexible and they've become much more popular in the last five years. They're explicitly designed for prototyping and experimentation — they bypass the Federal Acquisition Regulation, which is the procurement rulebook that makes normal government contracting so slow. Under an OTA, an agency can move from "we have a problem" to "we have a prototype" in months rather than years. DARPA has used them for decades. The IC has been accelerating their use specifically for AI work.

How much faster are we actually talking? Because "months instead of years" sounds good in the abstract, but I want to understand the magnitude of the difference.

A standard FAR-based procurement for a software contract — requirements definition, solicitation, evaluation, award — you're looking at eighteen months to three years before a line of code gets written for you. Under an OTA prototype agreement, you can have a signed agreement and a working team in sixty to ninety days. For AI work, where the technology is moving fast enough that a three-year-old model is essentially obsolete, that speed difference is not marginal. It's the difference between deploying something relevant and deploying something that's already been surpassed commercially.

The "bridge" model you described earlier — finding non-traditional contractors through accelerators — that's essentially using OTAs to onboard startups that would never survive a standard government contracting process.

Exactly the point. A twelve-person startup doesn't have a government contracts team. They don't know how to write a FAR-compliant proposal. The accelerator model — NGA's program, In-Q-Tel's structure — provides a pathway that doesn't require them to become a defense contractor first. And this is genuinely changing who participates in IC technology development. You're getting academic spinouts, open-source project maintainers, researchers who have never worked in the national security space before.

And that's where the security question gets really interesting, because now you have people who don't have clearances working on technology that might end up embedded in classified systems.

This is the core tension. And I want to be precise about what the barriers actually are, because the public conversation tends to collapse everything into "security clearance yes or no" and the reality is much more layered. There's the clearance itself — Secret, Top Secret, TS/SCI — which determines what classified information you can access. But there's also ITAR, the International Traffic in Arms Regulations, which restricts certain technologies from being shared with foreign nationals regardless of their clearance status. There's EAR, the Export Administration Regulations, for dual-use commercial technology. And then there's the need-to-know principle within classified systems, which means even a TS/SCI-cleared person can't access compartmented programs they're not read into.

So a cleared contractor who works on one IC program might have no visibility into a parallel program at the same classification level.

None at all. The compartmentalization is that granular. And what this means for IQTLabs specifically is that when they're developing something like Snowglobe in the open, they're operating in a carefully defined unclassified space — but the downstream applications, the actual exercises with the CIA, those happen in a different environment entirely.

So how do you actually develop tools for classified applications using non-cleared contributors? What's the technical mechanism?

There are a few approaches that the IC has converged on. The first is synthetic data. If you need to train or test an AI model on patterns that resemble intelligence data, you generate synthetic data that has the same statistical properties without containing any actual classified information. Non-cleared engineers can work with it freely. The second approach is what IQTLabs calls "air-gap development" — you build and test the tool in the open, on public data, and then the classified deployment happens in a separate environment that non-cleared contributors never touch. The tool migrates; the contributors don't.

Which works until the tool's behavior is fundamentally shaped by the classified context it's deployed in, at which point you've lost the ability to iterate using the open contributor base.

That's the real constraint, and it's not fully solved. The third approach is privacy-preserving machine learning techniques — differential privacy, federated learning, homomorphic encryption in some cases. Differential privacy lets you train a model on sensitive data while providing mathematical guarantees that individual data points can't be extracted from the model weights. Federated learning lets you train across distributed data sources without centralizing the data. Homomorphic encryption is still largely experimental for AI workloads — the compute overhead is enormous — but IARPA has funded research into it specifically for IC applications.

The homomorphic encryption angle is interesting because in theory it lets you run a model on encrypted data and get an encrypted result, which means the model never sees the plaintext.

In theory. The gap between theory and production deployment at scale is significant. The performance overhead is still prohibitive for most real-time applications. But for batch analysis work — running overnight, not in real-time — you can start to see how it becomes viable.

There's also the GitGeo tool that IQTLabs built, which I think is an underappreciated piece of this puzzle.

GitGeo is fascinating and it doesn't get enough attention. It's a tool for tracking the geographic origin of code contributions — essentially, looking at the IP addresses and account metadata of GitHub contributors to understand where code is coming from. The use case for IQTLabs is exactly what you'd expect: if you're building a tool that might end up in an intelligence context, you want to know if your open-source contributor base includes people from adversarial nations. China, Russia, Iran — there are known patterns of state-sponsored contribution to open-source projects as a vector for introducing vulnerabilities or backdoors.

So the CIA is on GitHub, and also monitoring GitHub for adversarial contributions to its own repositories. The meta-level of this is dizzying.

Welcome to the modern intelligence ecosystem. And this gets at something broader about how the IC thinks about supply chain security for AI. It's not just about the data — it's about the code, the dependencies, the contributors. A model that was fine-tuned using a library that was subtly compromised six months ago by a state actor is a problem that's very hard to detect after the fact.

And GitGeo can't catch everything, right? A sophisticated state actor isn't going to contribute from an IP address that geolocates to a known intelligence facility.

No, it's not a silver bullet. It's a triage tool — it catches the unsophisticated cases and raises flags that prompt deeper review. The more sophisticated threat, where a contributor has spent years building a credible open-source identity before introducing a malicious commit, is a much harder problem. That's closer to a human intelligence problem than a technical one. You're trying to detect intent, not just geography.

Let me push on the clearance paradox for a minute, because I think it's the most structurally interesting problem here. The people with the best AI skills are, broadly, not cleared. The people who are cleared are, broadly, not the people doing frontier AI research. How does the IC actually bridge that gap?

It's a genuine workforce crisis and the IC knows it. The clearance process takes eighteen months to two years on average. For a twenty-six-year-old machine learning researcher who has three job offers from frontier AI labs, waiting two years for a clearance while earning a government salary is not a competitive proposition. So the IC has been experimenting with several approaches. One is the "cleared enclave" model — you have a small cleared team that defines the problem and the evaluation criteria, and a larger non-cleared team that does the actual development work using synthetic data and sanitized problem specifications. The cleared team translates between the classified reality and the unclassified development environment.

That's a lot of translation overhead.

It is, and information gets lost in translation. The second approach is trying to accelerate clearances for high-priority AI talent — there have been congressional discussions about creating a fast-track process for STEM researchers. The third, which is what the accelerator model implicitly does, is finding people who are already cleared or who are early in their careers and can go through the process without the same opportunity cost.

And there's the ODNI restructuring question, which is worth mentioning. Because the oversight of In-Q-Tel is apparently shifting.

This is the political dimension and it's significant. Recent reporting — the Politico coverage and the Bismarck Brief analysis — indicates that the Office of the Director of National Intelligence is moving to assume more direct oversight of In-Q-Tel, pulling it away from its original CIA-centric structure. The logic is that if In-Q-Tel is supposed to serve all eighteen IC agencies, having it effectively controlled by one of them creates misaligned incentives. The CIA's technology priorities are not necessarily the NSA's priorities or the NGA's priorities.

And there's a power dimension here too. Whoever controls the technology pipeline controls a significant lever of national capability. That's not a bureaucratic footnote.

Not even slightly. In-Q-Tel's portfolio of investments represents a roadmap of where the IC thinks AI is going. If the ODNI controls that roadmap, it has visibility into and influence over the entire IC's technology direction. Tulsi Gabbard's tenure as Director of National Intelligence has been accompanied by a clear push to centralize certain oversight functions, and In-Q-Tel fits that pattern.

The dual-use requirement also creates interesting commercial dynamics that I don't think get enough attention. If a startup's AI wargaming tool has to have a commercial market to qualify for IQT investment, you end up with the intelligence community effectively subsidizing the development of AI tools that then get sold to... corporations? Other governments?

This is genuinely underexplored. The commercial market for AI-powered scenario simulation and wargaming is real — consulting firms, financial institutions doing geopolitical risk modeling, large multinationals with supply chain exposure to conflict zones. The line between "intelligence community wargaming tool" and "corporate geopolitical risk platform" is thin. And the companies that get IQT investment and develop these tools in partnership with the IC are then free to sell them commercially, with all the dual-use complexity that entails.

Which circles back to ITAR. Because if you've developed an AI system that the IC uses for sensitive scenario simulation, and you want to sell it to a foreign corporation or a foreign government, that's a very different regulatory conversation.

The export control question for AI models is genuinely unsettled law right now. The Bureau of Industry and Security has been working on AI-specific export control guidance, and the current framework is awkward — it was designed for hardware and specific software, not for model weights that can be copied and transmitted trivially. A seventy-billion-parameter model that the IC uses for policy simulation — can you export that? Under what conditions? The answer is not clear.

And the Snowglobe repository is public. Anyone can clone it. The model weights it uses are presumably from publicly available models. So the export control question is partly moot for the open-source version, but the moment you fine-tune it on classified or sensitive data, you've created a different artifact.

The "dual artifact" problem. The base tool is public; the deployed version with its classified fine-tuning is not. And the line between them is not always obvious, which is why the NSA's AI Security Center work on model confidentiality matters. It's not just about keeping the weights secret — it's about understanding what information is encoded in weights and how to verify that sensitive information hasn't leaked into a model that's being shared more broadly.

Is there a meaningful technical test for that? Like, can you actually audit a set of model weights and determine whether classified information is encoded in them?

This is an active research area and the honest answer is: not reliably, not yet. You can do membership inference attacks — probing the model to see if it behaves differently on data that might have been in its training set — but those give you probabilistic signals, not definitive answers. The interpretability research that's coming out of labs like Anthropic is relevant here, work on understanding what's actually stored in model activations, but it's not mature enough to serve as a forensic tool in a legal or regulatory sense. The NSA's AISC is funding work in this direction precisely because the gap between "we need to know what's in this model" and "we have a way to find out" is significant.

Let's talk about what this all means practically, because I think there are real implications here that go beyond the intelligence community specifically.

The first and most important shift is that the IC is becoming a sophisticated customer of commercial AI rather than a builder. This is a fundamental change from the Cold War model where the government was funding the frontier — ARPA created the internet, the NSA was doing cryptographic research that was decades ahead of academia. That's no longer true for AI. The frontier is in commercial labs, and the IC's job is to figure out how to use what's being built there, not to build it themselves.

Which means the IC's strategic leverage is increasingly in its ability to define problems clearly, evaluate solutions rigorously, and deploy quickly — not in its ability to do original research.

And that's actually a meaningful shift in what kind of talent and institutional capability matters. The IC needs people who understand both the technology and the intelligence mission deeply enough to be good customers. That's a different skill set than either pure engineering or pure analysis.

The second implication is for the security model itself. The traditional approach was "classify everything and control access." The new model is more like "classify the data, not necessarily the tools, and manage access at the data layer."

Data-centric security is the term of art. And it's better suited to a world where the tools are commercial and the value is in the data. You can't classify a large language model that's available on Hugging Face. What you can classify is the fine-tuning data, the evaluation results, the specific outputs from classified exercises. The security perimeter moves from "who can access this tool" to "what data can this tool touch."

Which creates new attack surfaces. If the model is public but the data is classified, the attack is on the data pipeline, not the model itself.

And on the interface between them, which is where the NSA's AISC work on prompt injection becomes relevant. If I can craft a prompt that causes an AI system to exfiltrate information from its context window, the security perimeter around the data doesn't help me if the model itself is the exfiltration vector.

For researchers and engineers who are working in AI and might encounter this world — what should they actually know?

The first thing is that classification frameworks are not monolithic. Understanding the difference between confidential, secret, top secret, and SCI — Sensitive Compartmented Information — matters, and understanding that need-to-know compartmentalization applies even within those levels. If you're doing work that might have IC applications, you need to understand what you're allowed to know and what you're not, and that's not always obvious from the public documentation.

The second thing is probably the supply chain question. If you're contributing to open-source projects, some of those projects have IC-adjacent downstream uses. That's not a reason to avoid contributing, but it's a reason to be thoughtful about the code you're writing and the dependencies you're introducing.

And from an organizational perspective, watch for the emergence of more "hybrid" entities — organizations that operate simultaneously in classified and unclassified spaces, that have cleared and non-cleared staff working on adjacent problems. IQTLabs is the current exemplar, but this model is going to propagate. The NGA accelerator, the DIU's Thunderforge structure — these are all variations on the same organizational innovation.

There's a practical dimension to this for anyone who might find themselves in that world. The cleared enclave model we talked about — small cleared team, larger non-cleared development team — creates a specific kind of career dynamic. The non-cleared engineers can do excellent work and never fully understand the context their work is deployed in. That's not unique to the IC; it's true of any large organization where different teams have different information. But the stakes and the opacity are different when the context is classified operations rather than, say, a product roadmap.

And the career path runs in both directions. People who start in the commercial AI world and then get cleared bring a different perspective than people who start cleared and learn AI on the job. The IC is actively trying to cultivate both pipelines, because they need people who can translate fluently in both directions.

The open question I keep coming back to is what happens when commercial AI becomes powerful enough that the IC's ability to use it is constrained by public safety concerns rather than classification concerns. We're starting to see export controls on AI chips, discussions about model capability thresholds. At some point, the most capable models might be too dangerous to deploy even in classified environments without extraordinary safeguards.

Or too dangerous to release commercially, which creates pressure to classify them. There's a scenario where frontier AI models — say, something genuinely capable of autonomous strategic reasoning — end up being treated more like nuclear material than software. Restricted access, controlled deployment, international non-proliferation frameworks. We're not there yet, but the direction of travel is visible.

And the nuclear analogy is worth sitting with for a second, because the history there is instructive. The early atomic scientists genuinely did not anticipate the full governance apparatus that would eventually surround their work — the classification regimes, the international treaties, the export controls, the entire nonproliferation infrastructure. That apparatus took decades to develop and was built largely in response to crises rather than in anticipation of them. The question for AI is whether we can do better than that.

The IC is probably the institution best positioned to navigate that transition, which is a strange thing to say about an organization that most people think of as primarily in the business of secrets.

It's the right framing though. The intelligence community's core competency is managing the tension between information availability and information control. That's exactly the problem that frontier AI presents to society at large. The tools the IC has developed — compartmentalization, need-to-know, classification frameworks, dual-use investment models — these are not perfectly suited to AI governance, but they're the closest thing we have to institutional knowledge on the problem.

The intelligence community as the unlikely governance template for the AI age. Daniel is going to love that framing.

I think it's where the evidence points. Snowglobe is a small project, but it's a window into a much larger institutional transformation. The IC figured out something important: you can't wall yourself off from commercial technology development and expect to stay competitive. The answer is a porous boundary with careful data management, and that turns out to be a model that has implications well beyond intelligence work.

Alright. I think the thread we're leaving open — and it's a genuinely important one — is the export control question for AI models. Because the legal framework is lagging badly behind the technical reality, and the decisions that get made in the next few years about how to regulate AI model weights internationally are going to shape this entire ecosystem in ways that are hard to predict right now.

The tension between innovation speed and security is going to define the next decade of this space. And the organizations that figure out how to navigate that tension — how to move fast while maintaining the security properties that matter — are going to have enormous structural advantages.

Thanks as always to our producer Hilbert Flumingtop for keeping this show running. Big thanks to Modal for the GPU credits that power the generation pipeline behind this episode. This has been My Weird Prompts. If you want to find us, head to myweirdprompts dot com for the RSS feed and all the ways to subscribe. We'll see you next time.

See you then.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2131: In-Q-Tel's Open-Source Wargames

Downloads

You Might Also Like

#2131: In-Q-Tel's Open-Source Wargames