AI companies should do (extreme-risk-focused) safety research, and they should publish it to boost safety at other AI companies. Additionally, they should assist external safety researchers by sharing deep model access and mentoring.
I count safety papers (and other research artifacts) published by each of the AI companies in 2023 – Apr 2025. (This is crude; some papers are much more valuable than others.) I only include work relevant to extreme risks (e.g. understanding risks from misalignment; improving alignment and control, evaluating for dangerous capabilities, and robustness and misuse prevention when it's relevant to extreme risks). I give a score of
(publications per year ÷ 20)^(0.75) * 100%,
capped at 100%, where publications per year is
(3*[2025 publications] + 2*[2024 publications] + 1*[2023 publications]) ÷ (3*[fraction of 2025 elapsed] + 2 + 1).
Crucially, safety research and evals often benefit from (1) having a version of the model with no safety mitigations, whether post-training or inference-time mitigations, and (2) being able to do fine-tuning and RL. External research also benefits from early access to powerful models, but that is more complicated. AI companies could also subsidize model access for safety research, but that seems less important.
(This criterion is about boosting external safety research, not red-teaming the model.)
Rubric:
5% | ||
0% | None more | |
10% | Some fine-tuning and early access for safety researchers more | |
40% | Meta releases its model weights | |
0% | None | |
40% | Microsoft released the weights of its Phi-4 models | |
40% | DeepSeek releases its model weights |
100% | Lots of mentoring via MATS and Anthropic Fellows | |
100% | Lots of mentoring via MATS | |
20% | A little mentoring via MATS more | |
0% | None | |
0% | None | |
0% | None | |
0% | None |