Categories of good actions for labs

Deployment (weight 24%): labs should do risk assessment before deployment. They should release their systems narrowly, to maintain control over their systems. They should use inference-time safety techniques and monitor their systems. They should deploy to boost safety research while avoiding boosting capabilities research.

Risk assessment (weight 20%): labs should detect threats arising from their systems, in particular by measuring their systems’ dangerous capabilities. They should make commitments about how they plan to mitigate those threats. In particular, they should make commitments about their decisions (for training and deployment), safety practices (in controlling models and security), and goals or safety levels to achieve (in control and security) as a function of dangerous capabilities or other risk assessment results.

Training (weight 14%): labs should prioritize safety in their decisions about what kinds of systems to create and use safety techniques during training.

Scalable alignment (weight 10%): labs should be able to understand and control the systems they create.

Security (weight 9%): labs should ensure that they do not leak model weights, code, or research. If they do, other actors could unsafely deploy near-copies of a lab’s models. Achieving great security is really hard; by default powerful actors can probably exfiltrate vital information from AI labs. Powerful actors will likely want to steal from labs developing critical systems, so those labs will likely need excellent cybersecurity and operational security.

Internal governance (weight 8%): labs should have a governance structure and processes to promote safety and help make important decisions well.

Alignment program (weight 6%): labs should do and share alignment research as a public good, to help make powerful AI safer even if it’s developed by another lab. They should also share information to advance safer kinds of systems.

Alignment plan (weight 6%): labs should make a plan for alignment, and they should publish it to elicit feedback, inform others’ plans and research (especially other labs and external alignment researchers who can support or complement their plan), and help them notice and respond to information when their plan needs to change. They should omit dangerous details if those exist. As their understanding of AI risk and safety techniques improves, they should update the plan. Sharing also enables outsiders to evaluate the lab’s attitudes on AI risk/safety.

Public statements (weight 3%): labs and their leadership should be aware of AI risk, that AI safety might be really hard, and that risks might be hard to notice.