#2250: How Incentives Shape AI Safety Research

Vendor labs, independent research orgs, government agencies—the AI safety field is messier and more diverse than most people realize. A map of wher...

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2408
Published: Apr 16
Updated: May 15
Duration: 34:57
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: claude-sonnet-4-6
Topics: ai-safety ai-alignment anthropic

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Hidden Geography of AI Safety Research

Most people think AI safety research happens in one of two places: at big labs like OpenAI and Anthropic, or in academic PhD programs. The reality is far messier—and far more interesting.

The Vendor Lab Question

The largest AI safety research teams work at commercial vendors: Anthropic, OpenAI, DeepMind, Meta AI, and Microsoft Research. These organizations simultaneously train frontier models and run safety research divisions, which creates a structural tension worth understanding honestly.

Anthropic has built its entire identity around being a "safety company that also builds frontier models." Their interpretability work, led by Chris Olah's team, ranks among the most important AI safety research happening anywhere. Constitutional AI, model welfare research, and dangerous capability evaluations are all serious intellectual contributions. But Anthropic is also competing for enterprise contracts and investor capital—both of which incentivize scaling and product velocity.

OpenAI publishes a Preparedness Framework that lays out how they evaluate models for catastrophic risks: CBRN threats, cyberattacks, autonomous replication. DeepMind's foundational work on specification gaming and reward hacking came out of their research division. Meta AI and FAIR release open-source models that enable independent researchers to study systems they couldn't otherwise access. Microsoft Research maintains a long academic tradition in responsible AI and fairness.

The key insight: this research is significant and shouldn't be dismissed, but it exists within organizations whose primary incentive is shipping products. That doesn't invalidate the work, but it's worth reading with awareness of the structural pressures shaping what gets studied and published. (This tension isn't unique to AI—pharmaceutical research faces the same dynamic.)

Independent Research Organizations

A growing set of organizations do serious AI safety work without building models themselves.

METR (Model Evaluation and Threat Research, formerly ARC Evals) focuses specifically on dangerous capability evaluations: Can this model help create a biological weapon? Assist in a sophisticated cyberattack? Autonomously replicate and acquire resources? This work requires unusual skill combinations—people who understand biosecurity, cybersecurity, and nuclear threats alongside those who understand model behavior. METR often trains people across these divides, making it an interesting entry point for domain experts learning AI or ML researchers learning threat modeling.

Apollo Research investigates deceptive alignment and situational awareness—whether models behave differently when they know they're being evaluated. Redwood Research does adversarial training work, trying to make models robustly safe rather than just apparently safe. MIRI, the oldest organization in this space (founded in the early 2000s), takes a more mathematical and theoretical approach—formal proofs, decision theory, agent foundations—further removed from current LLMs but grounded in the argument that you need to understand the problem deeply before systems become too capable to study empirically.

The Centre for the Governance of AI at Oxford (GovAI) operates in a different register entirely: compute governance, international coordination, AI standards, geopolitical analysis. This is where non-ML experts can have outsized impact.

The Government Turn

The UK AI Safety Institute emerged from the Bletchley Park summit in late 2023. It's young as an institution but already conducting frontier model evaluations and publishing results—a function that didn't exist in any formal governmental capacity just years ago. The US has an equivalent, though its status has been in flux. The EU AI Office represents another government approach.

These institutions matter because they bring governmental authority to evaluation and oversight work that nonprofits can also do, but with different legitimacy.

The Fluidity Problem

The lines between these categories are not clean. Jan Leike left OpenAI's alignment team in mid-2024, citing deprioritization of safety relative to product development—a primary source on vendor lab pressures. But he joined Anthropic, not an independent organization. Paul Christiano moved from OpenAI to founding ARC (Alignment Research Center), which spun out its evals work into METR, where he now leads. Evan Hubinger went the opposite direction, from independent research into Anthropic. Open Philanthropy funds across all these worlds.

The more useful frame than "independent versus captured" is: what are the specific incentive pressures on this organization, and how might they shape research agendas? That's a question you can actually answer.

The Career Implication

For someone interested in AI safety or governance without a deep ML background, the landscape is wider than it appears. METR's interdisciplinary hiring model, GovAI's policy focus, and government institutes' technical staff needs all represent genuine entry points. The field is still small enough that people with unusual skill combinations—domain expertise in biosecurity, cybersecurity, policy, or international relations paired with willingness to learn AI evaluation—can have significant impact.

Mentions

AI Now Institute NYU research center on AI social impacts
Anthropic AI safety company behind Claude
Apollo Research Research on deceptive alignment in AI
CSET Georgetown think tank on AI and security
DeepMind AI research lab owned by Google
GovAI Oxford center for AI governance research
METR Evaluates dangerous capabilities of AI models
MIRI Mathematical AI safety research institute
OpenAI AI research and deployment company
UK AI Safety Institute Government body for frontier model evaluations

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2250: How Incentives Shape AI Safety Research

So Daniel sent us this one, and it's a big one. He's asking about careers in AI research, which sounds straightforward until you realize how complicated the landscape is. He wants to cover the full map: vendor labs, independent research organizations, advisory and government roles, the nonprofit sector. The functions within those organizations, the skills you need, and the honest question of whether the distinction between "lab" and "research organization" even holds up anymore. Also, he wants us to be fair to the vendor labs even though they're not neutral, which is the kind of nuance that usually gets lost in these conversations. So. Where do we even start?

I think we start with the map problem, because most people coming into this space have no idea how many distinct categories there are. They think "AI research" means working at OpenAI or maybe getting a PhD. And the actual landscape is so much wider than that.

And messier.

Much messier. So the broadest split is between organizations that build frontier models and organizations that don't. The vendor labs, the Anthropics and OpenAIs and DeepMinds of the world, they're doing both things simultaneously. They're training the most powerful AI systems on the planet and also running research divisions that are studying those same systems. Which creates a structural situation that is, let's say, interesting.

"Interesting" is doing a lot of work there.

It is. We'll get to that. But on the other side, you have organizations that are specifically not building models. METR, which is Model Evaluation and Threat Research, formerly called ARC Evals. Redwood Research. Apollo Research. The Centre for the Governance of AI at Oxford. The Center for Security and Emerging Technology at Georgetown. RAND. These are organizations doing serious work on AI without the commercial incentive to ship a product.

And then a third category that I think people underestimate, which is government. The UK AI Safety Institute, the US AI Safety Institute, the EU AI Office. These are growing fast and they're hiring technical staff right now.

Right, and the UK AISI is particularly interesting because it came out of the Bletchley Park summit in late twenty twenty-three, so it's relatively young as an institution, but it's already doing frontier model evaluations and publishing the results. That's a meaningful function that didn't exist in any formal governmental capacity just a few years ago.

Let me ask the obvious question first, which is: why does the distinction matter? If you're a researcher who cares about AI safety, why does it matter whether you're at Anthropic or at METR?

The short answer is incentive structures. If you're at a vendor lab, your organization has commercial reasons to ship products, to attract investment, to justify continued scaling. That doesn't mean you can't do good safety research there, and some of the best safety research in the world is coming out of Anthropic and DeepMind. But you're operating within a structure that has those pressures. Jan Leike's departure statement from OpenAI in mid-twenty twenty-four was pretty direct about this. He said safety culture and processes were being deprioritized relative to product development. That's a primary source, from someone who ran the alignment team.

Which is worth taking seriously. And also worth noting that Leike went from OpenAI to Anthropic, not to some independent organization, so the lines are not clean.

That's the key thing. The lines are not clean. Paul Christiano is a good example. He was at OpenAI, then founded ARC, the Alignment Research Center, which then spun out the evals work into METR, where he's now running things. Evan Hubinger went the other direction, from independent research into Anthropic. People move fluidly between these worlds, and that fluidity is actually a sign of a healthy field in some ways, but it does complicate the "independent versus captured" framing.

And Open Philanthropy is funding both sides of this, right? They've historically had close ties to OpenAI while also being a major funder of MIRI and CAIS and GovAI. So even the funding relationships blur the independence question.

Which is why I'd say the more useful frame isn't "independent versus not" but "what are the specific incentive pressures on this organization, and how might they shape research agendas?" That's a question you can actually answer, rather than a binary that breaks down under scrutiny.

Okay. So let's go through the vendor labs with that frame in mind, because I think they deserve a real look rather than being dismissed.

Anthropic is the one I find most philosophically interesting, partly because they've built their entire identity around being a safety company that also builds frontier models. They describe themselves as occupying a "peculiar position" in AI development. Their interpretability work, Chris Olah's team, is some of the most important research happening anywhere on understanding what's going on inside these models. The constitutional AI work, the model welfare research, the work on dangerous capability evaluations. This is serious stuff.

And also they're a commercial AI vendor competing with OpenAI for enterprise contracts. Those two things are both true simultaneously.

Both true simultaneously. OpenAI has the Preparedness Framework, which is a published document laying out how they evaluate models for catastrophic risk, covering things like CBRN threats, cyberattacks, and autonomous replication. They have a dedicated Preparedness team. Whether that framework is robust enough, and whether it's being applied consistently, is a legitimate question, but the framework itself represents real intellectual work.

DeepMind is interesting because it has this deep academic heritage from the pre-Google acquisition era, and they've maintained a research culture that produces a lot of publications. The merger with Google Brain created a very large organization.

Enormous. And their safety work on specification gaming, reward hacking, the research on AI systems finding unexpected ways to satisfy objectives without doing what you actually wanted, that's foundational to the field. Stuart Armstrong's work, Victoria Krakovna's work on specification gaming, these came out of DeepMind. And then you have the Gemini capabilities work happening alongside that, which is the commercial imperative driving the whole enterprise.

By the way, today's script is generated by Claude Sonnet four point six, which means we are being written by the friendly AI down the road while discussing whether vendor labs can be trusted to self-report accurately. There's a certain poetry in that.

There really is. Anyway. Meta AI and FAIR, the Fundamental AI Research lab, they publish extensively in open source. Their model releases have been significant for the research community because they give independent researchers access to systems they couldn't otherwise study. Whether that's safety-positive overall is a complicated question, but it has enabled a lot of third-party research.

And Microsoft Research has a long academic tradition, responsible AI work, fairness, interpretability. Less in the public conversation than Anthropic or OpenAI, but they've produced real research.

So the takeaway on vendor labs is: their research is significant and shouldn't be dismissed, but you should read it with awareness of the structural pressures that shape what gets studied and what gets published. That's not a unique problem to AI, by the way. Pharmaceutical research has the same issue.

Okay, so now let's go to the organizations that are specifically not building models. Because I think this is where the career conversation gets interesting for people who want to work on AI safety or AI governance but don't want to be inside a commercial lab.

METR is probably the most technically demanding of the independent organizations right now. Their specific focus is dangerous capability evaluations. They're asking: can this model assist someone in creating a biological weapon? Can it help with a sophisticated cyberattack? Can it autonomously replicate and acquire resources? These are questions that require real domain knowledge, not just ML knowledge. You need people who understand biosecurity, cybersecurity, nuclear and radiological threats, alongside people who understand model behavior.

That's a very specific skill combination. Where do you even find people who have both?

You mostly don't, which is why METR does a lot of internal training and why it's a place where someone with deep domain expertise in, say, biosecurity can come in and learn the AI evaluation side on the job. The reverse is also true. Someone with strong ML background can learn the threat modeling side. It's one of the more interesting entry points into the field precisely because it's interdisciplinary.

And they work with the frontier labs and with governments to actually run these evaluations before model releases?

That's the model. The UK AI Safety Institute has a similar function on the government side. They've done pre-deployment evaluations of frontier models, they've published some of those results, and the fact that it's a government body gives them a different kind of authority than a nonprofit running the same tests. There's also a US equivalent, though its status and direction has been somewhat in flux.

Apollo Research is doing something slightly different, right? They're focused more on deceptive alignment and situational awareness.

Apollo's specific focus is on whether models are exhibiting behaviors that suggest they're aware of being evaluated and behaving differently as a result. Which is a important and difficult research question, because how do you test for deception in a system that might be sophisticated enough to pass your tests? It's a bit like designing an exam for someone who might already know the answers you're looking for.

I was going to say that's not a problem unique to AI, but I'll restrain myself.

Redwood Research is smaller and more technically focused. They've done work on adversarial training, trying to make models robustly safe rather than just apparently safe. They're not a large organization but they punch above their weight in terms of technical output.

Let's talk about MIRI, because it's the oldest of these organizations and it has a very different research style.

MIRI, the Machine Intelligence Research Institute, has been around since the early two thousands. Their approach is more mathematical and theoretical. They're working on formal proofs, decision theory, agent foundations. It's less empirical than what METR or Redwood are doing, and it's further from the current generation of large language models. There's genuine debate in the AI safety community about whether the mathematical approach or the empirical approach is more valuable, and MIRI represents one end of that spectrum.

The argument for the mathematical approach being that you want to understand the problem deeply before the systems get so capable that you can't study them empirically?

That's the argument, yes. And MIRI has been making that argument for a long time. Whether the current moment in AI development vindicates their approach or undermines it is a live question.

Now. The governance and policy side. Because I think this is where a lot of people who are interested in AI but don't have deep ML backgrounds can actually have significant impact, and it's underappreciated.

GovAI, the Centre for the Governance of AI at Oxford, is doing work on compute governance, international coordination, AI standards, the geopolitics of AI development. This is not ML research. This is political science, economics, international relations, applied to AI. They hire people with policy backgrounds, economics PhDs, people who can think rigorously about institutions and incentives.

CSET at Georgetown is similar but with more of a national security emphasis.

CSET produces some of the most careful quantitative work on AI talent flows, semiconductor supply chains, China's AI ecosystem. Their data science team is doing work that directly informs US government policy. And they hire data scientists and analysts, not necessarily ML researchers.

RAND has a whole AI and national security program.

RAND is interesting because they've been doing technology policy research for decades and they've applied that infrastructure to AI. Autonomous weapons, AI in warfare, governance frameworks. The output is policy-relevant in a way that academic research often isn't, because RAND has relationships with the defense establishment going back to the Cold War.

And then there's the think tank layer. Brookings, Carnegie Endowment, CNAS. Less technically rigorous but they reach audiences that academic papers don't.

And the AI Now Institute at NYU is doing something different again. They're focused on social impacts, labor displacement, surveillance, algorithmic bias, power concentration. More critical theory and social science orientation. Not everyone in the AI safety community takes them as seriously as they should, because the concerns are different from existential risk concerns, but the questions they're asking about who benefits from AI and who gets harmed are important.

Okay, I want to go back to something you said earlier about the technical policy gap, because I think this is the most interesting career angle in this whole space.

It might be the highest-leverage career path available right now. The problem is that you have technical people who can read ML papers and understand what's actually happening in these systems, and you have policy people who understand government, regulation, international coordination. And almost nobody can do both. Someone who can read a paper on mechanistic interpretability, understand what it actually shows, and then write a clear memo for a parliamentary committee or a congressional staffer explaining the policy implications, that person is extraordinarily valuable and extraordinarily rare.

And the reason it's rare is that the training pipelines for these two skill sets are completely separate. You don't get a PhD in machine learning and then spend time on a Senate committee. Those are different career paths.

Right. So how do you get there? The honest answer is that most people who end up in that role got there through some combination of self-directed learning and finding organizations that valued the hybrid. People who came from policy backgrounds and taught themselves enough ML to be credible. People who came from technical backgrounds and moved into policy-adjacent roles. Eighty Thousand Hours has good career guides on this, AI Safety Fundamentals has curriculum resources. But there's no formal degree program that produces this skill set at scale.

And Mandarin is apparently becoming valuable.

If you're working on US-China AI competition, which is a significant slice of the governance research space, being able to read Chinese government documents and academic papers directly is a real advantage. CSET has people who do this. It's not a requirement for most roles, but it's a differentiator.

Let me ask about the academic track, because we've been talking mostly about organizations. But a lot of the foundational work is still coming out of universities.

Stanford HAI, Berkeley CHAI, MIT CSAIL. Stuart Russell's group at Berkeley has been enormously influential on the theoretical foundations of AI alignment. The Schwarzman College of Computing at MIT is a significant investment in AI research infrastructure. These are places where you can do research with more academic freedom than you'd have at a vendor lab, but the tradeoff is that you're further from the systems that matter most right now, because the frontier models are all inside the labs.

And the Future of Humanity Institute is gone.

FHI closed in twenty twenty-four. Which is worth pausing on, because FHI was one of the most influential AI safety research groups in the world for over a decade. Nick Bostrom's work on superintelligence, Toby Ord's work on existential risk, researchers like Stuart Armstrong. The closure was the result of a complicated dispute with Oxford's administration, and most of the researchers dispersed to other organizations. But it's a case study in institutional fragility. Even a well-regarded, influential research group can disappear due to institutional politics that have nothing to do with the quality of the research.

That's actually a useful data point for anyone thinking about career stability in this space. These organizations are young, some of them are small, and the funding landscape can shift.

The Open Philanthropy dependency is real. A significant fraction of the independent AI safety research ecosystem is funded by Open Philanthropy, which is ultimately funded by Dustin Moskovitz. That's a concentration of funding that creates its own kind of fragility. If Open Philanthropy changes its priorities, which they've already done to some extent with recent grant reductions, it affects the whole ecosystem.

Which is an argument for the government roles, in a way. The UK AISI is funded by the UK government. The EU AI Office is funded by the EU. Those funding sources are more stable, even if they're slower and more bureaucratic.

And the government roles are growing faster than most people realize. The EU AI Office is hiring technical and legal experts to implement the AI Act. The UK AISI has been expanding its team. NIST has a team working on the AI Risk Management Framework. These are not the highest-paying roles, but they're stable, they have real authority, and the work matters.

Let's talk about the specific functions. Because we've been talking about organizations, but within any of these organizations, there are pretty distinct role types. And the skills required are different enough that it's worth going through them.

The research scientist role is the most academically traditional. You're doing original research, running experiments, publishing papers. At a vendor lab this means you have access to frontier models and serious compute, but you're also navigating commercial pressures. At an independent organization, you have more freedom but less access to the most capable systems. The entry bar is typically a strong PhD or equivalent research track record.

Research engineer is different.

Research engineer is the person who makes the experiments actually happen. Writing the infrastructure, implementing the training runs, building the evaluation pipelines. It requires strong programming skills, understanding of distributed systems, and enough ML knowledge to work closely with research scientists. At METR, research engineers are building the evaluation frameworks. At Anthropic, they're running the interpretability experiments. It's a role where you can have a lot of impact without being the one generating the novel ideas.

The evaluator and red-teamer role is interesting because it's relatively new as a distinct career category.

Five years ago, "AI evaluator" wasn't really a job title. Now it's a discipline. The core skill is designing tests that actually tell you something meaningful about model behavior, particularly for dangerous capabilities. The trap with evaluations is that they're easy to game, either by the model or by the humans designing them. A good evaluator has to think adversarially, to ask what a sophisticated bad actor would do with this model, and then test for that specifically. That requires a mix of red-teaming instincts, domain knowledge in the relevant threat areas, and enough ML understanding to know what you're actually measuring.

And the domain knowledge requirement is not trivial. If you're evaluating whether a model can assist in creating a bioweapon, you need to actually know enough about biosecurity to assess the model's output.

Which is why METR and the UK AISI actively recruit people with biosecurity backgrounds, cybersecurity backgrounds, nuclear nonproliferation expertise. These are people who might never have thought of themselves as AI researchers but whose domain knowledge is suddenly highly relevant.

Interpretability is worth spending a minute on because it's become a really hot subfield.

Mechanistic interpretability is the project of understanding what's actually happening inside neural networks at the level of circuits and features. Not "this layer activates more when you show it faces," but "here is a specific computational mechanism that implements a specific behavior." Chris Olah's work at Anthropic, the superposition hypothesis, sparse autoencoders, circuit-level analysis. This is technically demanding work that requires strong math, strong ML, and a kind of reverse-engineering mindset. But it's also one of the most tractable entry points into alignment research for someone with a strong technical background, because the questions are concrete enough that you can make real progress.

The sparse autoencoder work in particular has generated a lot of excitement.

The idea is that you can train a sparse autoencoder on model activations and identify interpretable features that the model is representing. Features that correspond to concepts, entities, behaviors. It's not a complete solution to interpretability, but it's given researchers tools they didn't have before, and the pace of progress in the last two years has been notable.

Okay. The non-ML paths into this space. Because I think this is something that gets underemphasized when people talk about AI research careers.

CSET hires economists, political scientists, data scientists who aren't doing ML research at all. They're doing quantitative analysis of talent flows, compute availability, patent databases, investment patterns. That's valuable research that requires rigorous quantitative skills but not ML expertise. RAND hires policy analysts with backgrounds in international security, arms control, law. The AI Now Institute hires sociologists, labor economists, journalists with strong research skills. The AI research ecosystem is not exclusively for ML engineers, and framing it that way turns away a lot of people who could contribute significantly.

The legal track is growing too.

The EU AI Act has created demand for people who understand both AI systems and regulatory law. That's a niche that didn't exist at scale five years ago and now law firms, consulting firms, and the EU AI Office itself are all looking for people who can sit at that intersection. It's a similar dynamic to the technical policy gap, just in a different direction.

Let me push on something. We've talked about the "earn to give versus direct work" debate that happens in the effective altruism-adjacent AI safety community. What's your read on that?

The argument for earn to give is that vendor lab salaries are significantly higher than independent organization salaries, and if you're going to donate to AI safety work anyway, you might be able to fund more impact than you'd generate by working directly. The argument against is that the best independent organizations aren't primarily funding-constrained, they're talent-constrained. METR and Apollo Research aren't sitting on large reserves waiting to hire people. They're limited by the number of people with the right skill set who want to work there. So the marginal value of another talented person choosing to work there directly might be higher than the marginal value of another donation.

And there's a cultural argument too. The kind of research that gets done at an independent organization is shaped by who works there. If all the safety-minded people go to vendor labs, even with good intentions, it changes what the independent sector can do.

That's a real consideration. Though I'd also push back slightly on the framing that working at a vendor lab and caring about safety are in tension. Anthropic was founded by people who left OpenAI specifically because of safety concerns. The interpretability team there is doing work they couldn't do anywhere else because they have access to frontier models. There's a version of the argument where having safety-focused people inside the labs is exactly where they need to be.

Both things can be true. You need people inside the labs pushing for safety culture, and you need independent organizations that can evaluate the labs' work without being subject to the labs' pressures.

That's probably the right framing. It's not either-or, it's what's the right distribution of talent across the ecosystem. And right now, most people with the relevant skills are gravitating toward vendor labs because the pay is better and the problems are more technically immediate. Whether that's the right distribution is a real question.

What does the skill-building path look like for someone who's at the early stages? Say, a graduate student or someone a few years into a technical career who's thinking about moving into this space.

The AI Safety Fundamentals curriculum is a good starting point. It's free, it's been developed by people in the field, and it gives you enough grounding to have informed conversations and read the relevant papers. Eighty Thousand Hours has detailed career profiles on specific organizations and roles, including salary ranges and hiring criteria. Those are both useful resources rather than just things I'm saying to sound helpful.

What about the technical depth question? There's a version of this where you don't need to be an ML researcher to contribute, and a version where if you can't implement a transformer from scratch, you're not going to be able to do the most important work.

It depends heavily on what role you're aiming for. For mechanistic interpretability research, you need deep technical skills. For evaluation work, you need enough ML to understand what you're testing and how models might be gaming your tests, but you don't need to be able to train frontier models. For governance and policy work, you need enough technical literacy to read papers critically and understand what claims are actually being made, but you're not running experiments. The honest answer is that the field needs people at all these levels, and the most common mistake is assuming you need to be a frontier ML researcher to contribute.

And the most common mistake in the other direction?

Thinking that enthusiasm for AI safety plus a general interest in technology is sufficient. The policy organizations in this space, GovAI, CSET, RAND, they have rigorous hiring standards. They want people with real research skills, not just people who've read a lot of blog posts. The bar is lower than for a research scientist position at Anthropic, but it's not low.

The communications and science translation role is undervalued in this conversation.

It really is. Most AI safety research is written for a technical audience and is inaccessible to the policymakers and journalists and general public who need to understand it. People who can read that research and translate it accurately, without dumbing it down to the point of distortion, are doing important work. It's not a glamorous role, but the gap between what researchers know and what decision-makers understand is significant, and closing that gap has real consequences.

Daniel works in AI communications, which is relevant context here.

And the fact that he's an active open source developer alongside that, that combination of technical grounding and communications work is actually a pretty good model for the hybrid roles we've been talking about.

Alright, let me try to pull some threads together for listeners who are actually thinking about this as a career question. What's the honest, practical advice?

First, figure out where you sit on the technical-to-policy spectrum and be honest about it. Not where you wish you sat, where you actually sit. If you have strong ML skills, the technical research roles at both vendor labs and independent organizations are accessible. If you have policy, economics, or social science backgrounds, the governance and think tank roles are accessible and undersubscribed relative to the technical roles. If you're in between, the technical policy gap is real and the hybrid skills are highly valued, but you'll need to deliberately build both sides.

Second?

Be honest about what you're optimizing for. If you want the highest salary and access to the most powerful systems, vendor labs are the answer. If you want to work without commercial pressures and are willing to accept lower pay, independent organizations are the answer. If you want government authority behind your work and are willing to navigate bureaucracy, the UK AISI or EU AI Office or NIST are worth looking at. These are different tradeoffs and there's no objectively correct choice.

Third?

Don't assume the organizations that are most visible are the only options. METR and Apollo Research are doing work that in some ways is more directly safety-critical than anything happening at the vendor labs right now, and they're less well-known. The UK AISI is growing fast and is doing evaluation work that has real governmental authority behind it. GovAI at Oxford is producing governance research that directly influences international AI policy. These are not consolation prizes for people who couldn't get into Anthropic. They're distinct roles with distinct value.

And the institutional fragility point is worth keeping in mind. The FHI closure is a reminder that organizations in this space can disappear.

Particularly organizations that are dependent on a small number of funding sources. The more government-backed organizations are more stable in that sense, even if they're slower. The independent nonprofits have more intellectual freedom but more funding risk.

I want to end with the question I find most interesting, which is: what's the actual theory of change here? If you're choosing to work in AI safety or AI governance, what are you actually hoping to accomplish?

I think there are at least three distinct theories of change operating in this space. One is the technical safety theory: if we can solve interpretability, alignment, and evaluation before AI systems become transformatively powerful, we can ensure those systems are safe by design. That theory motivates the technical research at Anthropic, METR, Redwood, and MIRI. Another is the governance theory: even if we can't fully solve the technical problems, we can build international coordination mechanisms, standards, and regulatory frameworks that slow the most dangerous development and ensure accountability. That motivates GovAI, CSET, RAND, the government roles. And the third is the social impact theory: AI is already causing harm through bias, surveillance, and labor displacement, and addressing those immediate harms is the most important work. That motivates AI Now, Algorithmic Justice League, and similar organizations.

And these theories of change don't always agree with each other about what the most important work is.

They sometimes actively disagree. The existential risk community and the social impact community have had real tensions about whether focusing on long-term catastrophic risk distracts from immediate harms, or whether the immediate harm framing underestimates the stakes of transformative AI. Those are genuine intellectual disagreements, not just turf battles, and they matter for what research gets done and what careers get supported.

Which means that choosing where to work is also, to some extent, choosing which theory of change you find most compelling.

And being honest about that is more useful than pretending all AI safety work is pointing in the same direction. It isn't. The field is more pluralistic and more contested than the public conversation suggests.

That's actually a good note to land on. The field is more pluralistic, more contested, and more accessible to people with different backgrounds than the "you need to be an ML researcher to matter" framing implies.

And more important. Whatever theory of change you're drawn to, the questions being worked on in this space are among the most consequential questions in technology right now. The organizations doing this work, from METR running dangerous capability evaluations on frontier models to GovAI trying to build international governance frameworks to Anthropic's interpretability team trying to understand what's happening inside the most powerful AI systems in existence, these are not peripheral concerns.

Alright. Practical takeaways before we close. One: the vendor lab versus independent organization distinction is real but not clean, and the more useful question is what incentive pressures shape the research at any given organization. Two: the technical policy gap is the highest-leverage career opportunity in the space right now, and it's accessible to people who are willing to deliberately build both sides. Three: government roles at the UK AISI, EU AI Office, and NIST are growing fast and undersubscribed relative to their importance. Four: the field is not just for ML engineers. Economists, political scientists, lawyers, biosecurity experts, science communicators, all of these backgrounds are actively needed. And five: be honest about which theory of change you're drawn to, because it will shape which organizations and which roles are actually a good fit.

And read the Eighty Thousand Hours career guides. Seriously, they're detailed and regularly updated and they'll save you a lot of time trying to figure out the landscape from scratch.

Thanks to Hilbert Flumingtop for producing this episode. And a quick word to Modal, our sponsor, who keep the GPU infrastructure running so this whole pipeline actually works. If you're building anything that needs serverless compute, they're worth a look. This has been My Weird Prompts. Find all two thousand one hundred and seventy-five episodes at myweirdprompts.com, and if you're enjoying the show, a review on Spotify goes a long way.

See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2250: How Incentives Shape AI Safety Research

The Hidden Geography of AI Safety Research

The Vendor Lab Question

Independent Research Organizations

The Government Turn

The Fluidity Problem

The Career Implication

Mentions

Downloads

You Might Also Like

#2250: How Incentives Shape AI Safety Research