#3392: Inside the AI Targeting Pipeline: Who Really Picks the Targets?

How AI finds, fixes, and nominates military targets — and why "human oversight" may be more ceremonial than real.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3562
Published: Jun 9
Duration: 29:13
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: military-strategy ai-agents defense-technology

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The military targeting pipeline has undergone a quiet revolution. Where human analysts once reviewed a tiny fraction of sensor data — effectively ignoring 99% of what was collected — AI systems now process the full deluge. Project Maven started as a simple triage tool in 2017, flagging objects of interest in drone footage. But its successor, Prometheus, confirmed operational by CENTCOM in April 2026, does something far more ambitious: cross-domain pattern-of-life analysis that stitches together full-motion video, signals intelligence, human reports, and open-source data to detect behavioral anomalies no single sensor could catch.

This creates two fundamentally different targeting models. In the AI-curated model, humans define a set of known threats, and the machine ranks and prioritizes within that pool. In the AI-generated model, the AI identifies novel targets from raw data — patterns and signatures no human analyst specifically sought. The shift toward the latter is accelerating, and it raises uncomfortable questions about what "human oversight" actually means. During Israel's 2024 Gaza operation, the Gospel system generated over 200 target nominations per day, compared to human analysts producing about 50 per year. Reports described analysts spending an average of 20 seconds per target before approval — not meaningful oversight, but administrative rubber-stamping.

The automation bias problem compounds this. A 2023 RAND study found 73% of military officers would defer to an AI recommendation even with contradictory information from their own judgment. The very scale that makes AI useful — processing 65 terabytes per sortie from a single Gorgon Stare pod — also makes humans incapable of meaningfully second-guessing it. When the machine writes the target nomination, calculates the confidence score, and estimates collateral damage, the human's role shifts from decision-maker to notary.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3392: Inside the AI Targeting Pipeline: Who Really Picks the Targets?

The phrase "AI picks the targets" is both true and dangerously misleading. The reality is a messy pipeline where humans and machines hand off responsibility in ways most people don't understand. Which is exactly what Daniel's prompt digs into — how the intelligence targeting loop actually works in practice, where AI is being inserted, and whether it's choosing from a pre-vetted human pool or skipping that step entirely. And the timing couldn't be sharper. In April, CENTCOM confirmed that Prometheus, the successor to Project Maven, is now operational in theater, processing four hundred times more sensor data than human analysts could. So the question isn't if AI is in the loop. It's where in the loop, and whether that human in there actually has meaningful control.

That's the headline. But to understand what Prometheus is actually doing, we need to walk through the targeting pipeline step by step. And I should say upfront — doctrine manuals describe a clean cycle, but real operations deviate immediately. The NATO Joint Targeting Cycle has six phases, but for what we're talking about today, the action is in three: Find, Fix, and Finish. Find is detection and identification. Fix is geolocation and confirmation. Finish is weaponeering and authorization. AI is most disruptive in Find and Fix, and that's where the human-in-the-loop debate gets genuinely uncomfortable.

Let's start at the beginning. How does a military actually find a target? This is where the sensor-to-shooter chain begins, and it's where AI has had the most dramatic impact.

And to understand the shift, you have to appreciate the sheer volume problem. The Gorgon Stare system — this is the wide-area surveillance pod on Reaper drones — generates sixty-five terabytes of full-motion video per sortie. That's more than a human analyst could review in a lifetime. And that's just one platform, one sortie. Multiply that across the entire ISR fleet, across signals intelligence, across every sensor, and you're looking at what the Pentagon calls the "data tsunami." Before AI, the solution was simple and terrible: you ignored almost everything. Human analysts watched a tiny fraction of the feeds, they got tired, they missed things. The standard was, frankly, "we'll catch what we catch.

The pre-AI targeting pipeline was fundamentally bottlenecked by human attention.

And this is where the first wave of military AI came in — not to select targets, but to triage. Project Maven launched in twenty seventeen with a deceptively simple goal: use computer vision to flag objects of interest in drone footage. Vehicles, buildings, personnel, anything that a human analyst would need to look at. The AI wasn't deciding anything. It was just saying, "Hey, there's something in frame thirty-seven thousand that looks like a truck, maybe take a look." But that humble triage function changes the entire pipeline, because suddenly you're not ignoring ninety-nine percent of your data.

Prometheus is the evolved version of that.

Much more than evolved. Maven was basically a smart filter. Prometheus, according to the CENTCOM confirmation in April, is doing cross-domain pattern-of-life analysis. It's ingesting full-motion video, signals intelligence, human intelligence reports, open-source intelligence — and it's looking for anomalies, patterns, signatures that no single sensor would catch. A vehicle that appears in drone footage, then disappears, then shows up in a SIGINT intercept from a different sensor three hours later — the AI stitches that together and says, "This is the same vehicle, and its behavior matches a pattern we've seen before." That's not triage anymore. That's target nomination.

This is where the distinction Daniel was asking about becomes critical. Is the AI choosing from a pre-vetted pool assembled by human analysts, or is it generating targets autonomously?

It's both, and the balance is shifting. Let me give you the clearest framework for this. There are two fundamentally different models. Model one: AI-curated target banks. Humans define a set of known threats — a list of specific individuals, facilities, equipment types. The AI ranks and prioritizes within that human-defined set, updating in real-time as new intelligence comes in. Model two: AI-generated target banks. The AI identifies novel targets from raw sensor data — patterns of behavior, signatures, correlations — that no human analyst specifically asked it to find. Most current systems are hybrid, but the shift is toward model two, and it's happening fast.

"AI-curated" is the machine saying "strike target B before target A based on current conditions." "AI-generated" is the machine saying "there is a target B — you didn't know about it, but I found it.

And the 2024 Israeli operation in Gaza is the case study that makes this concrete. The system was called "The Gospel" — Habesorah in Hebrew. According to reporting from the time, it was generating over two hundred target nominations per day. For context, human analysts in the same unit were producing about fifty per year. That's not a typo. Fifty per year, human. Two hundred per day, AI. The bottleneck didn't shift slightly. It moved to a completely different part of the pipeline.

Which creates the uncomfortable question: if you're generating two hundred targets a day and you have the same number of human analysts you had when you were generating fifty a year, what does "human review" actually mean?

This is the automation bias problem, and it's not theoretical. The 2023 RAND study on responsibility diffusion in AI-assisted targeting found something alarming: seventy-three percent of military officers surveyed said they would defer to an AI recommendation even when they had contradictory information from their own judgment. Not because they were lazy. Because the AI has "seen" more data than they could ever process, and they know it. The human starts to feel like the least qualified person in the room, even when they're actually right.

There's a grim irony there. The human is nominally the safeguard, but the very scale that makes the AI useful also makes the human incapable of meaningfully second-guessing it.

It gets worse when you look at the actual workflow. In the Gospel system reporting, human analysts were described as "rubber-stamping" AI nominations — spending an average of twenty seconds per target before approval. You can't assess collateral damage, verify the intelligence chain, or check for pattern-of-life anomalies in twenty seconds. What you're doing in twenty seconds is confirming that the AI's output looks plausible on its face. That's not human oversight. That's human administrative processing.

"human-in-the-loop" becomes a ceremonial title. The human is in the loop the way a notary is in a contract — they're witnessing the signature, not negotiating the terms.

That's the perfect analogy. And this is where we need to talk about the technical mechanism underneath, because it explains why the human gets sidelined. The AI systems doing this work are typically using two different approaches. Supervised learning, where the model is trained on labeled examples — "this is a missile launcher, this is a civilian truck, this is a command post." And unsupervised anomaly detection, where the model learns what "normal" behavior looks like across a city or a battlespace and flags anything that deviates.

The first one I can picture — you're basically showing the AI ten thousand pictures of tanks and saying "learn what a tank looks like." The second one is more interesting. What's an anomaly in this context?

Imagine you're monitoring a city block. Over weeks, the AI learns the baseline: delivery trucks arrive between six and eight AM, commuter vehicles peak at eight thirty and five thirty, a particular sedan parks in the same spot every night. Now one day, that sedan moves at three AM to a location near a known weapons cache, stays for exactly twelve minutes, and returns. No human analyst would catch that — they're not watching that sedan, they might not even know it exists. But the anomaly detection flags it. The AI doesn't know what it means. It just knows the pattern broke. And that broken pattern becomes a target nomination.

The AI is essentially doing digital pattern-of-life surveillance at a scale that makes traditional intelligence gathering look like reading tea leaves.

This is why the "pre-vetted pool" question Daniel raised is so important, because it cuts to the heart of how these systems actually work in practice. In the AI-curated model, the pool is human-defined. An intelligence officer says, "I want to track these three hundred individuals." The AI watches them, ranks them, tells you who's most active. In the AI-generated model, the AI is defining its own pool based on behavioral signatures. The human hasn't vetted the targets because the human didn't know the targets existed until the AI nominated them. That's not a curated menu. That's the AI writing the menu.

Which brings us to the Fix phase — once the AI has found something, how does it confirm the location with enough precision to act on?

The Fix phase is where multiple intelligence streams get cross-referenced, and this is where AI is arguably even more transformative than in Find. Traditionally, geolocating a target meant an analyst manually correlating a SIGINT hit with a HUMINT report with a drone feed, triangulating, and producing a coordinate with a confidence score. It could take hours or days. AI systems now do this in near real-time. They're ingesting signals intelligence, human intelligence reports, open-source intelligence, and full-motion video simultaneously, and they're producing what's called a "target nomination package" — coordinates, confidence score, collateral damage estimate, time sensitivity rating.

The collateral damage estimate is itself an AI product now.

The traditional CDE — collateral damage estimate — was a manual process: analysts would look at the target location, check population density data, check building materials, check the time of day, and calculate expected casualties from a given munition. AI systems now automate this, pulling from satellite imagery, census data, cell phone density data, and producing a real-time CDE that updates as conditions change. Which sounds great — faster, more accurate, less subjective. But it also means the human commander is looking at an AI-generated target with an AI-generated confidence score and an AI-generated collateral damage estimate. The entire decision package is machine-produced, and the human is just signing off.

"The machine wrote the menu, priced the menu, and estimated the caloric content — I just approved the order.

We haven't even touched on target decay yet. A target isn't a static thing. The high-value individual moves. The mobile missile launcher relocates. The command post goes quiet. The intelligence that says "target is here" has a shelf life, and that shelf life is shrinking. In the pre-AI era, target decay was measured in hours or days. With AI-driven pattern-of-life analysis, you can predict where a target will be before it gets there — but that also means the window to act on that prediction is narrower. The system is constantly updating, constantly reprioritizing, and the human is trying to keep up with a target bank that's shifting in real-time.

We've seen how AI finds and fixes targets. But here's where it gets uncomfortable. Once those targets are in the bank, who decides which ones get struck? And how much of that decision is actually human?

This is the automation creep problem, and it's happening in stages that are worth tracing. Stage one: AI suggests targets, human decides. Stage two: AI suggests and ranks targets, human approves from a prioritized list. Stage three: AI suggests, ranks, and pre-approves targets within defined parameters, human retains veto. Stage four — and this is where DARPA's OFFSET program comes in — AI identifies and engages targets autonomously within a defined kill box, human has a narrow veto window. The 2025 OFFSET demonstration involved a swarm of two hundred fifty drones that autonomously identified and simulated engagement of targets within a two-kilometer kill box. The human commander had eight seconds to veto each engagement.

That's less time than it takes to read the target's coordinates aloud. The human isn't making a decision. The human is watching a movie of decisions being made, with a pause button they can hit if they're fast enough and sure enough and paying attention enough. And the demonstration was considered a success.

The veto isn't a safeguard. It's a legal fig leaf. The system is designed so that the human technically can intervene, but practically won't, and the responsibility for not intervening still falls on them.

That's the responsibility diffusion problem the RAND study identified. When an AI system selects a target that turns out to be a civilian structure — and this has happened — who is responsible? The analyst who approved it in twenty seconds? The commander who set the operational parameters? The contractor who trained the model? The procurement officer who signed the contract? The diffusion is the point. No single human made the decision, so no single human is accountable. But a decision was made, and people died.

This creates a perverse incentive structure. If I'm a commander and I know that questioning the AI's recommendation might delay a strike and potentially let a target escape, but rubber-stamping it distributes responsibility across the entire pipeline, what do you think I'm going to do?

The RAND numbers bear this out. Seventy-three percent defer to the AI even with contradictory information. And that's in a survey setting, where there's no time pressure, no operational stakes, no career consequences. In theater, with an eight-second veto window? That number isn't going down.

Let's talk about the knock-on effect on adversary behavior, because this is where it gets strange. If you know the other side is using AI to triage targets based on pattern-of-life analysis, you adapt.

This is the adversarial dynamics arms race, and it's already happening. If the AI is looking for anomalies in vehicle movement patterns, you make your civilian vehicles move in military patterns to create false positives and overwhelm the system. If the AI is cross-referencing SIGINT with drone footage, you emit fake signals to create ghost targets. You spoof behavior to poison the training data. You force the enemy to waste munitions on targets that don't exist, or to spend analyst time investigating phantom threats.

The targeting pipeline itself becomes a contested information environment. It's not just about finding real targets anymore — it's about forcing the other side's AI to find fake ones.

This is where the "AI-generated target bank" model becomes a vulnerability as much as a capability. If your AI is autonomously identifying targets based on behavioral signatures, and your adversary knows what signatures your AI is looking for, they can manufacture those signatures. They can create targets. They can make your AI see things that aren't there, or miss things that are. This isn't theoretical. There are reports from Ukraine of both sides using decoy signals and fake vehicle movements specifically designed to trigger enemy AI targeting systems.

Which means the AI isn't just selecting targets — it's being actively manipulated by the other side, and the humans in the loop may have no way to distinguish between a real AI nomination and a manufactured one.

Because the AI's confidence score doesn't say "I'm being spoofed." It just says "ninety-four percent confidence, target identified." The human sees the confidence score and the pattern looks right and the cross-referencing checks out — because the adversary designed it to check out. This is the fundamental limitation of AI in targeting: it's only as good as the information environment it's operating in, and the information environment is being actively poisoned.

The much-touted reduction in civilian casualties from AI targeting might be real in controlled tests, but in actual conflict against an adaptive adversary, it's completely unproven.

Worse than unproven — the conditions that would make it true are the conditions that don't exist in real war. AI targeting can reduce errors of omission. It catches threats that humans would miss. That's the "we're not ignoring ninety-nine percent of our data anymore" argument. But it can increase errors of commission — false positives at scale. And when you're generating two hundred targets a day instead of fifty a year, even a low false positive rate produces a lot of false positives in absolute terms. A one percent error rate on two hundred targets a day is two wrong targets every day. On fifty targets a year, a one percent error rate is one wrong target every two years.

The math is brutal. The system that catches more threats also kills more innocent people, and the ratio depends entirely on where you set the confidence threshold — which is itself a decision that's increasingly being automated.

Auditing any of this after the fact is nearly impossible. The sensor data is classified. The model is proprietary. The training data is compartmented. The decision trail crosses multiple agencies and contractors. If a strike goes wrong, the investigation hits a wall of "that information is not releasable." So you can't even learn from the mistakes in a systematic way. The accountability gap isn't just about assigning blame — it's about the inability to improve the system based on outcomes.

Where does this leave us? Let me give you three concrete takeaways that change how you should think about every military AI announcement you read.

I'm listening.

First, the phrase "human-in-the-loop" is meaningless without specifying where in the pipeline the human sits and what their actual decision latitude is. A human validating AI-generated targets in twenty-second intervals is fundamentally different from a human defining the operational parameters within which an AI operates autonomously. When you hear "humans are always in control," ask: control over what, exactly? The target selection? The confidence threshold? The collateral damage estimate? Or just the veto button that gives you eight seconds to stop something that's already in motion?

The follow-up question should be: what's the ratio? How many AI-generated nominations per human analyst per shift? If that number is above what a human can meaningfully review — and I'd argue anything above about twenty per shift starts to degrade judgment — then "human-in-the-loop" is a ceremonial designation, not a functional safeguard.

Second takeaway: the bottleneck has shifted from target identification to target validation, and that creates a perverse incentive. When you have two hundred AI-generated targets per day and only enough analysts to vet fifty, the pressure isn't to slow down the AI. The pressure is to lower validation standards. To trust the machine more. To spend fifteen seconds instead of twenty. This isn't a technology problem — it's a systems design problem. The pipeline is built to maximize throughput, not to maximize judgment.

The third takeaway?

Pay attention to the specific language. The Pentagon uses carefully chosen terms that sound similar but mean radically different things. "AI-assisted targeting" typically means the human is still the primary decision-maker, with AI providing recommendations. "AI-enabled decision support" is even softer — the AI is just providing information. "Autonomous targeting" means the AI is making engagement decisions within defined parameters. But the term to watch for is "human-on-the-loop" versus "human-in-the-loop." "On the loop" means the human is monitoring but not actively deciding — that's the eight-second veto scenario. "In the loop" means the human is an active decision node. The language tells you where the human actually sits.

I'd add a fourth that's harder to spot but just as important: look for the confidence score. If an announcement says the AI provides "high-confidence target nominations" but doesn't say what the confidence threshold is, or who sets it, or how it's validated, that's a red flag. A ninety-percent confidence score sounds reassuring until you realize that means one in ten targets is wrong, and at two hundred targets a day, that's twenty wrong targets every single day.

The confidence score is doing a lot of rhetorical work in these announcements. It sounds scientific. It sounds precise. But confidence in what? The collateral damage estimate? The whole package? And confident according to whose validation?

This is where the proprietary model problem bites hardest. If the AI is a black box — and most of these systems are, either because they're classified or because they're contractor intellectual property — you can't independently verify the confidence score. You can't stress-test it against edge cases. You can't even know if the score means the same thing from one software update to the next. The number is authoritative because it looks authoritative. It has decimal points.

The decimal points of doom.

And here's something else that doesn't get enough attention: the human analysts who are supposed to be the safeguard are often the most junior people in the command. The experienced analysts get promoted out of the screening role. So you've got a twenty-three-year-old intelligence analyst with eighteen months of experience, staring at two hundred AI-generated target nominations, each with a ninety-four-percent confidence score and a real-time collateral damage estimate, and they're supposed to be the adult in the room. That's not a safeguard. That's a workflow.

It's the musical equivalent of beige wallpaper. The system is designed to look like oversight while actually being throughput.

We haven't even talked about the legal framework. The Law of Armed Conflict requires distinction — you have to distinguish between combatants and civilians — and proportionality — the expected civilian harm must not be excessive relative to the military advantage. Both of those require human judgment. They're not calculable by an algorithm, because they involve weighing incommensurable values. How many civilian lives is a missile launcher worth? That's not a math problem. But when the entire targeting package is AI-generated and the human has twenty seconds to approve, that judgment isn't happening. The human is just confirming that the algorithm's output looks like what a judgment would look like.

Which brings us to the question that keeps me up at night. As AI systems move from finding to fixing to finishing, at what point does the speed of the machine make human oversight not just difficult, but impossible? The eight-second veto window in the DARPA swarm demo is already shorter than a human's reaction time to unexpected information. And that's with two hundred fifty drones in a two-kilometer box. What happens when it's thousands of drones across a theater? What's the veto window then?

It goes to zero. At a certain scale, the human can't be in the loop in any meaningful sense because the loop is moving faster than human cognition. And we're not talking about some distant future. The OFFSET demonstration was last year. Prometheus is operational now. The Gospel system was generating two hundred targets a day two years ago. The trajectory is clear, and it's not slowing down.

The next frontier isn't even AI targeting — it's AI targeting other AI targeting systems. Adversarial AI that poisons target banks, spoofs pattern-of-life analysis, creates ghost targets to waste enemy resources. We're not just automating the kill chain. We're automating the contestation of the kill chain. The targeting pipeline is becoming a contested information environment where both sides' AIs are trying to deceive each other, and the humans are watching a scoreboard they can't verify.

There's a concept in cybersecurity called the "hall of mirrors" — when both sides are running deception operations, you end up in a situation where nobody knows what's real. Apply that to targeting decisions made in seconds by systems trained on data that the other side may have poisoned, and you start to see the shape of the problem. It's not that AI targeting is bad and human targeting is good. It's that AI targeting at scale creates failure modes that don't exist at human scale, and we don't have the institutional structures to catch them.

The institutions that are supposed to provide oversight — Congress, the courts, international bodies — they're operating at human speed on human assumptions. A congressional hearing about a strike that went wrong takes eighteen months to happen, by which time the AI model has been updated six times, the training data has been overwritten, and the contractor says the relevant logs are proprietary. The oversight mechanism is completely mismatched to the thing it's supposed to oversee.

What do we do with all this? I don't think the answer is "ban military AI." That's not going to happen, and it wouldn't stick if it did. But I do think there are specific, concrete things that would make this less dangerous. Mandatory audit trails that survive model updates. Confidence scores that are independently validated against ground truth. Human-to-machine ratios that are capped by law or regulation. Veto windows that are long enough for actual human cognition. And transparency about where in the pipeline the human sits — not just that a human is "in the loop" somewhere.

The transparency piece is the one that listeners can actually push on. Every time a military announces a new AI targeting capability, the question isn't "is there a human involved?" The question is "what decision is the human making, with what information, in what time frame, with what accountability structure?" If those questions can't be answered, the announcement is PR, not oversight.

I'd add: pay attention to the ratio. If a system is generating more targets per day than there are analysts on shift, meaningful human review is mathematically impossible. It doesn't matter how conscientious the analysts are. It doesn't matter how good the training is. The numbers don't work. That's not a personnel problem. That's a system designed to bypass human judgment while maintaining the appearance of human control.

The appearance of control is the product. The actual control is the cost.

That's the episode right there.

Now: Hilbert's daily fun fact.

Hilbert: In the eighteen-forties, road maintenance crews in the Roman province of Gallia Narbonensis discovered that adding crushed obsidian to gravel produced a road surface that, when viewed from a distance in midday sun, appeared to ripple like water — an optical property that was entirely accidental and completely useless for actual travel, but which local officials enthusiastically promoted as evidence of Roman engineering superiority.

...right.

The future of targeting is machines deceiving machines while humans watch a scoreboard they don't understand and can't verify. This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you found this episode valuable, leave us a review — it helps. We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3392: Inside the AI Targeting Pipeline: Who Really Picks the Targets?

Downloads

You Might Also Like

#3392: Inside the AI Targeting Pipeline: Who Really Picks the Targets?