We collect actions for frontier Al labs to avert extreme risks from AI, then evaluate particular labs accordingly.
These criteria are not exhaustive or sufficient: they’re the best we’ve got given what we currently know, given that they have to be concrete, and given that they have to be evaluated based on public information. Some important variables are hard to measure, especially from outside the labs.
This website is a low-key research preview. We’re sharing it to elicit feedback and help determine which direction to take it. Although it’s in beta, we endorse the content (modulo a few criteria where it’s hard to find sources) and you are welcome to share and discuss it publicly. Leave feedback here.
Avoid dangers in deployment, and boost safety but avoid boosting dangerous research
Predict models' dangerous capabilities and check for warning signs for those capabilities
Make new systems safer, create them cautiously, and advance safer kinds of systems
Understand and control systems the lab creates
Prevent model weights and research from being stolen
Have internal structure and processes to promote safety and help make important decisions well
Perform and share alignment research at all; have an alignment research team
Make a plan for aligning powerful systems the lab creates
Be aware of AI risk, that AI safety might be really hard, and that risks might be hard to notice
This scorecard is mostly up to date as of 1 November 2024. While the scores are mostly up to date, the site contains many references to old models, policies, and facts.
*We believe that the Security scores are very poor indicators about labs’ security. Security is hard to evaluate from outside the organization; few organizations say much about their security. But we believe that each security criterion corresponds to a good ask for labs. If you have suggestions for better criteria—or can point us to sources showing that our scoring is wrong, or want to convince us that some of our criteria should be removed—please let us know.