#1067: The 3,000-Person Army: How Major AI Models Actually Ship

Think AI is built by a few geniuses? Discover the army of 3,000 specialists required to ship a single major model update.

0:000:00

Episode Details

Published: Mar 9
Duration: 27:22
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: large-language-models fault-tolerance ai-operations

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The prevailing image of artificial intelligence development often involves a lone, brilliant researcher discovering a "magical" algorithm in a dark room. However, the reality of modern AI production is a massive industrial and sociological feat. Shipping a major update for a flagship model is no longer the work of a small team; it is the result of a coordinated effort involving thousands of specialized professionals.

The Shift from Research to Engineering

While the core mathematical architecture of AI models has remained relatively stable, the machinery required to make these models useful and safe for the public has expanded exponentially. In a typical flagship release, the number of people actually "touching" the core model architecture is surprisingly small—often just a few dozen researchers. The vast majority of the workforce, which can exceed 3,000 people, is dedicated to the complex ecosystem surrounding that core.

This shift marks a transition from research-led development to engineering-led development. The bottleneck is no longer just a lack of clever ideas, but the massive coordination required to manage data, safety, and infrastructure at scale.

The Post-Training Industrial Complex

A significant portion of the human capital in AI is now dedicated to data quality and alignment. This "post-training industrial complex" involves data operations specialists who manage massive pipelines of synthetic data and oversee hundreds of PhD-level subject matter experts. These experts, ranging from medical doctors to research scientists, act as the ultimate arbiters of truth, grading model outputs to ensure factual accuracy and safety.

Furthermore, the ratio of safety and compliance staff to core research staff has shifted dramatically. In the early days of the industry, safety was a secondary concern. Today, safety and red-teaming departments—composed of cybersecurity experts, psychologists, and sociologists—are integral to the process. Their job is to break the model before the public can, probing for biases and adversarial vulnerabilities.

Infrastructure and Custom Silicon

The physical reality of AI is equally labor-intensive. Training a frontier model requires a distributed cluster of tens of thousands of GPUs or custom-designed TPUs. This necessitates an army of infrastructure engineers, silicon designers, and compiler experts. These teams perform hardware-software co-design to ensure that data moves through the system with minimal latency. At this scale, a single networking glitch can ruin a training run costing millions of dollars, making infrastructure the unsung backbone of the entire industry.

The New Veto Power

Perhaps the most significant change in recent years is the integration of legal, ethics, and compliance teams directly into the development lifecycle. These departments no longer act as mere advisors; they often hold veto power over model weights. If a model presents a copyright risk or fails to meet international safety standards for autonomous agents, these teams can prevent a release entirely.

This has given rise to "translation" roles—individuals who bridge the gap between technical researchers, product managers, and legal counsel. As AI becomes more agentic and complex, the success of a model depends less on the code itself and more on the thousands of tiny human decisions made to balance performance, safety, and liability.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1067: The 3,000-Person Army: How Major AI Models Actually Ship

Daniel's Prompt

Custom topic: Major AI models are used by hundreds of millions of customers around the world every day. imagine that we are behind the scenes at a major AI lab preparing an incremental release of a new model that's

You know Herman, I was looking at the change log for the January update of Gemini that dropped a few weeks ago, and it hit me just how much of an illusion the lone genius myth really is in this industry. We have this image of a brilliant researcher sitting in a dark room late at night, typing out the one magical algorithm that changes everything, but the sheer scale of the commits on that release tells a completely different story.

It really does. Herman Poppleberry here, by the way, for anyone just joining us. And you are spot on, Corn. That January release wasn't just a piece of code. It was a massive sociological and industrial feat. Our housemate Daniel actually sent us a prompt about this very thing this morning. He was asking about the human capital required to actually ship a major AI model update like that. He wanted us to deconstruct the multidisciplinary teams behind the scenes.

It is a great question from Daniel because we often talk about these models as if they are these monolithic entities that just emerge from a pile of GPUs. But there is an army of people in the background. If you look at a top tier lab today, whether it is Google DeepMind or Anthropic or OpenAI, the organizational coordination required is staggering.

It is. When you look at the January twenty twenty-six Gemini update specifically, we are talking about thousands of specialized roles. It is not just fifty people. It is likely thousands of people who could justifiably claim they had a major role in bringing that specific version to life.

That is the number that fascinates me. Thousands. I think the public assumes it is maybe a hundred researchers and some software engineers to keep the lights on. But if we are defining a major contributor as someone whose work was critical to the safety, performance, or deployment of the model, the headcount explodes.

And I think we should really dig into what those roles actually look like. Because the shift we have seen over the last couple of years is a move away from model training as the primary bottleneck and toward what I would call model productization and alignment.

Right, and that is where the multidisciplinary aspect comes in. It is no longer just computer science. It is linguistics, it is law, it is ethics, it is infrastructure engineering, and it is even specialized domain expertise in fields like medicine or organic chemistry.

So let us frame this for a second. If you are sitting in a room at one of these labs, how many people are you actually looking at? If we are talking about a flagship release, I would estimate that between fifteen hundred and three thousand people are directly involved in the model lifecycle.

Three thousand people. That is a small army. And I think it helps to break that down into the core versus the support architecture. Because people hear three thousand and they think, well, they cannot all be writing the neural network code.

Oh, definitely not. In fact, the percentage of people actually touching the core model architecture is surprisingly small. It might only be a few dozen people who are actually tweaking the transformer blocks or the attention mechanisms. The rest of that three thousand person army is focused on everything that surrounds the core.

That is a perfect place to start. Let us talk about the data curation engineers. Because we have moved far beyond the era where you just scrape the entire internet and hope for the best.

In twenty twenty-six, data quality engineering is arguably more critical than the model architecture itself. We talked about this a bit in episode six hundred and fifty when we were looking at the Deep Think reasoning layers. You cannot just throw raw text at a model and expect it to reason. You need highly curated, high quality data.

So who are these people? These are not just web crawlers.

No, these are data operations specialists. They are managing massive pipelines of synthetic data generation. They are working with experts to create gold standard datasets. For the latest Gemini release, for example, they would have had hundreds of PhDs in various fields just grading the model's outputs during the training phase.

So you have a team of, say, fifty or a hundred medical doctors or research scientists whose entire job is to look at model responses and say, this is factually correct, this is slightly misleading, and this is dangerous.

Precisely. And then you have the engineers who build the infrastructure to manage those humans. This is the post training industrial complex. We are talking about reinforcement learning from human feedback, or RLHF. That pipeline alone requires an entire department of engineers, project managers, and quality assurance specialists.

It is interesting because it feels like the ratio of compute engineers to model researchers has shifted. It used to be all about the research. Now it is about the scale.

It really is. I would say the ratio of safety and compliance staff to core research staff has shifted from maybe one to ten back in twenty twenty-two to nearly one to three today in early twenty twenty-six.

One to three. That is a massive investment in safety. And I suspect that is not just because they are being altruistic. It is because the regulatory environment and the commercial stakes are so high that a single major hallucination or a jailbreak can cost a company billions in market cap.

And that brings us to the red teaming department. These are the people whose entire job is to break the model before it gets to the public. They are simulating adversarial attacks, trying to find ways to bypass the safety filters, and probing for biases that might have crept in during the training process.

I love the idea of a red team. It is very much a security mindset. It is not about making the model smarter; it is about making it more resilient. And that is a very different skill set than traditional machine learning research.

It is. You need people with backgrounds in cybersecurity, but also people with backgrounds in psychology and sociology. You need someone who understands how a human might try to manipulate a conversation to get the model to do something it should not.

And then you have the infrastructure team. These are the unsung heroes. I think people forget that training a model at the scale of Gemini requires a massive, distributed cluster of tens of thousands of GPUs.

Oh, the infrastructure team is arguably the most important group in the whole building. If one rack of servers goes down or if the networking latency between two clusters spikes, the entire training run can be ruined. We are talking about months of work and millions of dollars in electricity potentially going down the drain.

So these are the people doing hardware software co-design. They are optimizing the very lowest levels of the stack to make sure the data moves as fast as possible through the system.

And in twenty twenty-six, that has become even more complex because we are no longer just using off the shelf chips. Companies like Google are using their own custom TPUs, their tensor processing units. So you have a whole team of silicon engineers and compiler experts who are making sure the software is perfectly tuned to the hardware.

It really highlights the shift from research led development to engineering led development. In the early days, you could have a breakthrough with a clever idea and a few GPUs. Today, you need a multi billion dollar infrastructure and an army of specialists to keep it running.

And it is not just the technical side either. This is something that often gets overlooked in these discussions, but the hidden layers of legal, compliance, and ethics teams now have actual veto power over model weights.

That is a fascinating shift. So you have a situation where the model is finished, it is performing better than anything else on the market, but the legal team steps in and says, we cannot ship this because the copyright risk on this specific subset of the training data is too high.

Or the compliance team says it does not meet the new European or American safety standards for autonomous agents. In twenty twenty-six, these teams are not just advisors. They are integrated into the development process from day one. They are checking the data sources, they are monitoring the training progress, and they are defining the guardrails.

It reminds me of what we discussed in episode eight hundred and eight about the AI deprecation trap. The complexity of these models makes them very fragile from a legal and regulatory perspective. You cannot just release a model and fix it later. Once it is out there, it is out there.

And because the models are becoming more agentic, the stakes are even higher. We talked about sub agent delegation in episode seven hundred and ninety five, and that has created a whole new class of roles that did not exist two years ago. I call them workflow architects.

Workflow architects. I like that. What do they actually do?

These are people who are not necessarily traditional machine learning engineers. Their job is to design the ways in which the model interacts with other systems. If the model is supposed to be able to book a flight or write code and then test it, you need someone to architect the safety protocols and the feedback loops for those specific actions.

So they are building the bridge between the raw intelligence of the model and the practical application in the real world.

Precisely. And that requires a deep understanding of both the model's capabilities and the specific domain it is being applied to. It is a very multidisciplinary role. You might have someone with a background in systems engineering or even philosophy working on these problems.

It is interesting to compare the staffing of a twenty twenty-three model release to this twenty twenty-six Gemini update. If you look at the headcount, the research department has grown, sure, but the product and safety headcounts have grown four times faster.

That is the real story here. The core math of the transformers has stayed relatively stable, but the machinery required to make those transformers useful and safe for a hundred million people has expanded exponentially.

It makes me think about the second order effects of this scale. When you have three thousand people working on a project, communication becomes the primary bottleneck. You need people whose entire job is just to translate between the different departments.

You are talking about the translation roles. This is something I have been seeing more and more of. You need people who can explain the technical behavior of a model to the legal team, and then take the legal requirements and translate them into technical constraints for the researchers.

It is almost like a diplomatic corps within the company. Because these different groups speak completely different languages. A researcher thinks in terms of loss curves and perplexity. A lawyer thinks in terms of liability and intellectual property.

And a product manager thinks in terms of user retention and latency. If the model takes ten seconds to respond because of all the safety checks, the product manager is going to be unhappy. If the model responds in one second but says something offensive, the legal team is going to be unhappy.

It is a constant balancing act. And I think that is why the major contributor threshold has moved. It is no longer just about who wrote the code. It is about who managed the trade offs that allowed the code to be shipped.

That is a great way to put it. The creation of a model like Gemini is a series of thousands of tiny decisions and trade offs made by thousands of different people. If any one of those groups fails, the whole project fails.

It really reframes the whole idea of what it means to build AI. It is not a math problem. It is a logistics problem. It is a coordination problem. It is an industrial engineering problem on a massive scale.

And that brings us to some practical takeaways for our listeners. Because I know many of you are working in or around AI labs, or maybe you are looking to build your own teams. And the biggest lesson from the last few years is that if you want to ship at scale, you need to hire for data operations and safety engineering long before you think about hiring another dozen PhDs.

I think that is such an important point. The world has enough people who can build a basic model. What the world lacks are people who can take that model and make it reliable, safe, and performant in a production environment.

And that requires those cross functional translation roles we were talking about. If you can be the person who understands the math but can also talk to the lawyers and the product people, you are incredibly valuable in twenty twenty-six.

It also means that for the average person looking at these models, we should have a bit more respect for the complexity of what is happening under the hood. When Gemini gives you a thoughtful, safe, and accurate answer, it is not just because the AI is smart. It is because thousands of humans worked for months to make sure it gave you that specific answer.

It is a human achievement as much as a technical one. In many ways, these models are a mirror of the collective effort of the thousands of people who built them. Their values, their biases, their expertise, and their caution are all baked into the weights of the model.

Which brings up a really interesting point about the future. We talked about autonomous AI research in episode five hundred and eighty four, specifically the Alithia system from DeepMind. And the goal there is eventually to have the AI help build its own successor.

That is the ultimate paradox, isn't it? We are currently hiring thousands of the smartest people on the planet to build systems that are designed to eventually automate many of those very roles.

Do you think we will ever see a major flagship release that was built by a team of ten people and an autonomous research agent?

I think we are a long way from that for the flagship models, simply because of the safety and coordination requirements. Even if the AI can write the code and curate the data, you still need the human legal team, the human ethics board, and the human infrastructure engineers to sign off on it.

Right. Because the risk is not just technical. The risk is social and legal. And until we have an AI that can represent a company in court or take responsibility for a multi billion dollar mistake, we are going to need those thousands of humans in the loop.

It is a fascinating dynamic. The more powerful the AI becomes, the more humans we seem to need to manage it. It is the opposite of what people predicted. They thought AI would lead to smaller, leaner companies. Instead, it has led to these massive, multidisciplinary organizations that are more complex than anything we have seen in the software industry before.

It reminds me of the early days of the space program. You had the astronauts who got all the glory, but behind them were tens of thousands of engineers and mathematicians and technicians who made the moon landing possible.

That is the perfect analogy. The model is the astronaut. It is the part everyone sees. But the lab is NASA. It is the massive infrastructure and the army of people that actually makes the mission happen.

And I think as we move further into twenty twenty-six and beyond, that NASA scale infrastructure is only going to become more important. As the models get more capable, the guardrails have to get stronger, the data has to get cleaner, and the hardware has to get faster.

Which means the barrier to entry for building a flagship model is only going to get higher. It is not just about having the best algorithm anymore. It is about having the best organization.

It is a competitive advantage that is very hard to replicate. You can open source a model's weights, but you cannot open source the organizational knowledge and the multidisciplinary teams that created it.

That is a brilliant point, Corn. The real intellectual property of a company like Google or Anthropic isn't just the model. It is the process. It is the ability to coordinate three thousand specialists to produce a single, reliable output.

I think that is a great place to start wrapping this up. We have gone from the myth of the lone genius to the reality of the three thousand person army. And it really changes how you look at every interaction you have with these systems.

It really does. And if you are listening to this and you are one of those three thousand people, whether you are a data grader, a lawyer, or an infrastructure engineer, thank you. Because the work you are doing is what actually makes this technology useful for the rest of us.

And thank you to Daniel for sending in this prompt. It was a great excuse to dive into the human side of the AI revolution. It is easy to get caught up in the chips and the code, but at the end of the day, it is a human story.

It really is. And hey, if you have been enjoying My Weird Prompts and you want to support what we are doing, the best thing you can do is leave us a review on your podcast app or on Spotify. It genuinely helps other people find the show and it means a lot to us.

Yeah, it really does. We have been doing this for over a thousand episodes now, and the community feedback is what keeps us going. You can always find our full archive and the contact form at myweirdprompts dot com.

And we are on Spotify as well, of course. We will be back next time with another deep dive into whatever weird and wonderful topics Daniel and the rest of you send our way.

Until next time, I am Corn.

And I am Herman Poppleberry. Thanks for listening to My Weird Prompts.

Take care, everyone.

See you in the next one.

You know, Herman, before we go, I was just thinking about that ratio you mentioned. One to three for safety staff. Do you think that is a stable ratio, or is it going to keep shifting until the majority of people at these labs are actually safety and compliance?

That is the million dollar question, isn't it? If you look at other highly regulated industries, like aerospace or pharmaceuticals, the safety and compliance headcount often far outweighs the research headcount. So I would not be surprised if we see that ratio hit one to one in the next couple of years.

It is a strange future where the primary job of an AI company is not to build AI, but to prevent the AI from doing things it should not do.

It is the ultimate alignment challenge. We are building gods, and then we are hiring an army of priests to make sure they stay friendly.

A bit of a dramatic way to put it, but I think the analogy holds. Anyway, we should probably get back to it. Daniel mentioned he wanted to talk about the latest updates to the Jerusalem tech scene later tonight.

Oh, that should be a good one. There is a lot happening right here in our own backyard.

Definitely. Alright, thanks again for listening everyone. We will talk to you soon.

Goodbye for now.

Alright, let us go see what Daniel is up to. I think he is testing out that new agentic workflow we were talking about earlier.

I hope he does not let it book another trip to the Maldives on our shared credit card.

One can only hope. Talk to you later, Herman.

Later, Corn.

So, thinking about the scale again, Herman. You mentioned fifteen hundred to three thousand people. If we look at the entire ecosystem, including the people at the data labeling firms in places like Kenya or the Philippines, that number has to be in the tens of thousands, right?

Oh, absolutely. If you include the entire supply chain of human feedback, we are talking about a global workforce. Those three thousand people I mentioned are just the direct employees at the lab itself. The broader ecosystem is massive.

It really is a global industrial project. It is not just happening in Silicon Valley. It is happening everywhere.

It is a reminder of how interconnected we all are now. This technology is being built by the world, for the world.

That is a nice thought to end on. A global effort for a global tool.

Precisely. Alright, now we are really going.

Catch you later.

Bye.

Wait, one more thing. Do you think the shift toward reasoning models like Gemini three point zero Pro actually increases or decreases the need for human graders?

I think it increases it, but it changes the nature of the work. You need fewer people doing simple labeling and more people doing complex reasoning verification. You need graders who can follow a chain of thought and identify where a logical fallacy occurred.

So the jobs are getting more difficult and more specialized.

It is a move from low skill data work to high skill cognitive work. It is another way the human capital requirements are evolving.

Fascinating. Okay, now I am really done.

Me too. See you.

Bye.

Actually, Corn, did you see the report on the electricity usage for that January training run?

I did. It was equivalent to the yearly usage of a small city.

It really puts the logistics in perspective. You are not just managing people; you are managing the power grid.

It is a heavy responsibility.

It is. Alright, for real this time. Goodbye.

Goodbye.

See you at home.

I am already at home, Herman. We live together.

Right. See you in the kitchen then.

Sounds good.

Bye.

One more thing... just kidding.

You almost had me there.

I know. Alright, let us wrap this.

Done.

This has been My Weird Prompts.

Episode one thousand forty seven.

See you at one thousand forty eight.

Looking forward to it.

Me too.

Alright, turning off the mic now.

About time.

Three, two, one...

Wait, did I mention the website?

Yes, Corn, you mentioned the website.

Okay, good.

And Spotify.

We are good.

Okay. Now we are done.

Bye.

Seriously, bye.

Bye!

Okay.

Turning it off.

Go for it.

Done.

Actually...

Corn!

Just kidding.

You are impossible.

I know. It is a brother thing.

It really is.

Alright, see you later.

See you.

Bye.

Really.

Okay.

Bye.

See you.

Okay, I am stopping.

Good.

Now.

Okay.

Bye.

Seriously.

Okay.

Bye.

See you in a bit.

Yep.

Bye.

Okay.

Bye.

See you.

Okay.

Bye.

Bye.

Herman

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#1067: The 3,000-Person Army: How Major AI Models Actually Ship

The Shift from Research to Engineering

The Post-Training Industrial Complex

Infrastructure and Custom Silicon

The New Veto Power

Downloads

You Might Also Like

Episode #1067: The 3,000-Person Army: How Major AI Models Actually Ship