What labs should do

Labs, their leadership, their boards, and their staff should be aware of AI risk, that AI safety might be really hard, and that risks might be hard to notice.

It’s also good for labs to have true beliefs on many topics around AI, including future AI progress, good actions or appropriate responses, and the effects of powerful AI.

Leadership’s public statements matter insofar as they affect or are evidence about future lab behavior, leadership and staff attitudes, and other labs.

Public statements are hard to evaluate quantitatively, in part because we aren’t sure what perfect attitudes look like. Ideal evaluation would involve collecting and distilling what labs and their leadership say on AI risk, AI progress, and good actions.

Evaluation

(1/5) The lab and its leadership understand extreme misuse or structural risks.
- (1/5) … and they understand misalignment, that AI safety might be really hard, that risks might be hard to notice, that powerful capabilities might appear suddenly, and why they might need an alignment plan, and they talk about all this.
  - (1/5) … and they talk about it often/consistently.
    - (1/5) … and they consistently emphasize extreme risks.
(1/5) Clearly describe a worst-case plausible outcome from AI and state the lab’s credence in such an outcome. Use specific numbers; clarify terms like “catastrophe” and “extreme.” This statement should be on the lab’s website, not just spoken by a leader in an interview. (It should not be contradicted by other things said by the lab and its leadership.)

Sources

Statement on AI Risk (CAIS 2023)

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
The CEOs of Google DeepMind, OpenAI, and Anthropic signed.
About 59 staff from DeepMind, 28 from OpenAI, 15 from Anthropic, 6 from Microsoft, and 1 from Meta AI signed.

What labs are doing

Microsoft: Microsoft and its CEO Satya Nadella seem to never talk about extreme risks or the alignment problem. Some other Microsoft leadership seems aware of extreme risks; in particular, CTO Kevin Scott and Chief Scientific Officer Eric Horvitz signed the CAIS letter. Microsoft recently formed Microsoft AI and hired Mustafa Suleyman to lead it; Suleyman occasionally speaks about extreme risks (but seems not to really speak about alignment) and also signed the CAIS letter.

Google DeepMind: Google and DeepMind as organizations seem to never talk about extreme risks or the alignment problem. DeepMind CEO Demis Hassabis and Chief AGI Scientist Shane Legg sometimes talk about extreme risks and the alignment problem,¹ and they signed the CAIS letter, as did COO Lila Ibrahim. But DeepMind and its leadership miss lots of opportunities to talk about extreme risks or the alignment problem; for example, DeepMind’s Responsibility & Safety page doesn’t mention either. Google leadership seems to never talk about extreme risks or the alignment problem. DeepMind does research aimed at preventing extreme risks and solving the alignment problem; the rest of Google basically does not. DeepMind’s safety team has some plans for solving the alignment problem, but DeepMind as an organization does not.

Meta AI: Meta AI and its leadership seem to disbelieve in extreme risks and the alignment problem.²

OpenAI: OpenAI and its CEO Sam Altman sometimes talk about extreme risks and the alignment problem.³ Altman, CTO Mira Murati, Chief Scientist Ilya Sutskever, cofounders John Schulman and Wojciech Zaremba, and board member Adam D’Angelo signed the CAIS letter. OpenAI also makes plans and does research aimed at preventing extreme risks and solving the alignment problem; see our Alignment plan and Alignment research pages.

Anthropic: Anthropic and its CEO Dario Amodei often talk about extreme risks and the alignment problem, including that nobody knows how to control powerful AI systems.⁴ Dario Amodei, President Daniela Amodei, cofounders Jared Kaplan and Chris Olah, and board member Luke Muehlhauser signed the CAIS letter. Anthropic also makes plans and does research aimed at preventing extreme risks and solving the alignment problem; see our Alignment plan and Alignment research pages.

For a collection of three labs’ public statements on AI policy, see AI labs’ statements on governance (Stein-Perlman 2023).

On Hassabis, see e.g.:
On Legg, see e.g.:
↩
On Chief AI Scientist Yann LeCun, see e.g.:
↩
See e.g.:
- Introducing Superalignment (OpenAI: Leike and Sutskever 2023)
- Planning for AGI and beyond (OpenAI: Altman 2023) Altman is more concerned about misuse than misalignment; see e.g.:
- Altman interview with StrictlyVC (2023):
  
  “And the bad case—and I think this is important to say—is lights out for all of us. I’m more worried about an accidental misuse case in the short term where someone gets a super powerful– it’s not like the AI wakes up and decides to be evil. I think all of the traditional AI safety thinkers reveal a lot more about themselves than they mean to when they talk about what they think the AGI is going to be like. But I can see the accidental misuse case clearly and that’s super bad. So I think it’s impossible to overstate the importance of AI safety and alignment work. I would like to see much much more happening. But I think it’s more subtle than most people think. You hear a lot of people talk about AI capabilities and AI alignment as orthogonal vectors– you’re bad if you’re a capabilities researcher and you’re good if you’re an alignment researcher. It actually sounds very reasonable, but they’re almost the same thing. Deep learning is just going to solve all of these problems and so far that’s what the progress has been. And progress on capabilities is also what has let us make the systems safer and vice versa surprisingly. So I think [] none of the soundbite easy answers work.”
- Altman interview with Bloomberg (2023):
  
  You signed a 22-word statement warning about the dangers of AI. It reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Connect the dots for us here. How do we get from a cool chatbot to the end of humanity?
  
  Well, we’re planning not to.
  
  That’s the hope– but there’s also the fear?
  
  I think there’s many ways it could go wrong. We work with powerful technology that can be used in dangerous ways very frequently in the world. And I think we’ve developed, over the decades, good safety system practices in many categories. It’s not perfect, and this won’t be perfect either. Things will go wrong, but I think we’ll be able to mitigate some of the worst scenarios you could imagine. You know, bioterror is a common example; cybersecurity is another. There are many more we could talk about. But the main thing that I feel is important about this technology is that we are on an exponential curve, and a relatively steep one. And human intuition for exponential curves is really bad in general. It clearly was not that important in our evolutionary history. So given that we all have that weakness I think we have to really push ourselves to say– OK, GPT-4, not a risk like you’re talking about there, but how sure are we that GPT-9 won’t be? And if it might be—even if there’s a small percentage change of it being really bad—that deserves great care.
But see his old blogposts concerned with misalignment, in particular Machine intelligence, part 1 (Altman 2015). ↩
See e.g.:
- Core Views on AI Safety (Anthropic 2023)
- Responsible Scaling Policy (Anthropic 2023)
- Dario Amodei on Dwarkesh 2023
- Dario Amodei 2023 Senate testimony
↩