About

We1 collect safety recommendations for frontier AI labs and information about what labs are doing, then evaluate frontier AI labs on safety. We focus on what labs should do to prevent extreme risks such as human extinction.


This project is by Zach Stein-Perlman. Zach is an independent AI governance researcher. He’s particularly interested in what frontier AI labs can do to make the long-term future go well. Previously he worked at AI Impacts on AI forecasting, strategy, and governance.


Contact us here. How you can help:

  • Tell us what’s wrong/missing; point us to sources on what labs should do or what they are doing
  • Suggest better evaluation criteria
  • Provide feedback on how you use this site or how this project is helpful for you or what would make it more helpful
  • Share this project
  • Publicly write about what labs should do or operationalizations/proxies/evaluations
  • Help this project find an institutional home
  • Volunteer your webdev skills
  • (Pitch me on new projects or offer me a job)
  • (Want to help and aren’t sure how? Get in touch!)

Suggested citation model for the whole project:

AI Lab Watch. ailabwatch.org. Accessed 30 Apr. 2024.

Suggested citation model for a specific page:

AI Lab Watch. “Other scorecards & evaluation.” ailabwatch.org/resources/other-scorecards/. Accessed 30 Apr. 2024.

We don’t currently have a good way for you to cite stable snapshots of this site. At least links to pages should never break. (But some links to sections within pages will.)


Clarifications & disclaimers

There is little expert consensus on which possible risks from advanced AI are real. This project focuses on the prospect of extreme risks such as human extinction, especially due to uncontrollable AI.

This project is concerned with future AI systems that could cause a global catastrophe. Dangerous paths seem to include language models, some bioengineering, and maybe RL agents. They do not include vision, image generation, narrow game-playing, medical stuff (except drug discovery), prosaic data analysis, and perhaps robotics.

This project evaluates AI labs, not AI systems or products. It implies little about which current systems or products or are ‘safer’ or ‘more ethical.’ All existing AI systems are safe in that they can’t cause a catastrophe.

The evaluation component of this project does not attempt to answer a sharp question. It aims to answer something like “how virtuous are various labs, in terms of AI safety.” It does not directly answer the questions “which labs are net-positive for AI safety” or “which labs should we prefer to create powerful AI” or “which lab should you work at”; those questions implicate different criteria.

A high score would mean that the lab is doing well on safety—or at least the currently measurable and externally visible component of safety—but no specific criterion. Scoring 100% is not necessary or sufficient for a lab to ensure safety. (An early version of this project gave labs 100% in a category unless they were failing to take a specific known action. Given that setup, if I was in charge of a lab, I could quickly achieve 100%. But now labs can be scored on safety metrics where nobody yet knows how to do well. Labs should discover how to do well on these metrics; in the meantime, safety is lacking.)

This project sometimes feels like it’s forced to look under the lamppost—to evaluate what’s possible to evaluate well rather than what’s actually important. Properties of labs that are important but hard to evaluate well include: a lab’s behavior/safety when models are more powerful, what a lab will do if safety is hard, how a lab would use its position to prevent other labs from causing a catastrophe, and hidden attitudes. Many crucial variables are hard to measure, whether they’re about labs’ future behavior or just opaque from outside the lab.

This project focuses on labs’ current practices and plans (and recent practices insofar as they’re evidence about current practices.) So it mostly does not consider factors like whether e.g. the lab’s actions many years ago are responsible for a dangerous research path today or the lab shouldn’t have been founded.

Why these labs, and what about other labs? OpenAI, Google DeepMind, and Anthropic tentatively seem to lead in language model capabilities. Meta AI is not far behind and is notable for pushing the frontier of language models with publicly available weights. Microsoft is notable for its partnerships with other labs and its Copilot chatbot and Azure platform, and it seems likely to make frontier models itself in the future. Safety recommendations for non-frontier labs/models are pretty different; many safety techniques and kinds of caution are unnecessary for non-frontier labs/models. See our Other labs page.

Our methodology will change over time, so changes in a lab’s score over time may not be meaningful.

Our evaluation is somewhat crude. As mentioned above, many important variables are hard to measure. Also, many possible safety improvements would not be captured by our evaluation. In particular, our evaluation sometimes compresses much of a lab’s safety in a category to just a few binary variables on whether they are implementing particular practices or not. Also, we neglect to consider interactions between categories or other nuances. For example, security for model weights mostly only matters if a lab would try to keep those weights private, and producing shared goods like alignment research or information about AI risk is relatively more important for labs that won’t create critical systems first.

Some known issues (suggestions welcome):

  • Our Security evaluations underestimate the security of big tech companies that don’t publish much about their security.
  • Our Alignment program criterion is binary; ideally it would capture the quality and quantity of labs’ alignment research.
  • (Some subcategories and evaluation criteria have weight zero; they need better recommendations or criteria.)
  • (Some kinds of information can’t be captured well by this project; discussed above.)
  1. This project is basically by Zach but generally written in first-person plural. Partially this is because collaborators may join the project. Partially it is because the project’s views are not identical to Zach’s views; for example, the project generally disregards Zach’s private, illegible, or hard-to-categorize knowledge.