Anthropic

Risk assessment 49%

60%
Evals: domains, quality, elicitation
15%
Evals: accountability
50%
Adversarial evaluation for alignment
75%
Model organisms

Evals: domains, quality, elicitation

60%
Click to show details/rubric

Evals: accountability

15%
Click to show details/rubric

Adversarial evaluation for alignment

50%
Click to show details/rubric

Model organisms

75%
Click to show details/rubric

Scheming risk prevention 2%

0%
Safety case: process
0%
Safety case: accountability
0%
Internal deployment protocol
0%
External deployment protocol
0%
Plan for if an AI is caught scheming
0%
Training: remove scheming capabilities
0%
Training: adversarial training
50%
Training: safe architecture

Safety case: process

0%
Click to show details/rubric

Safety case: accountability

0%
Click to show details/rubric

Internal deployment protocol

0%
Click to show details/rubric

External deployment protocol

0%
Click to show details/rubric

Plan for if an AI is caught scheming

0%
Click to show details/rubric

Training: remove scheming capabilities

0%
Click to show details/rubric

Training: adversarial training

0%
Click to show details/rubric

Training: safe architecture

50%
Click to show details/rubric

Boosting safety research 68%

82%
Publishing safety research
5%
Deep access for external safety researchers
100%
Mentoring external safety researchers

Publishing safety research

82%
Click to show details/rubric

Deep access for external safety researchers

5%
Click to show details/rubric

Mentoring external safety researchers

100%

Misuse prevention 9%

20%
Safety case: process
0%
Safety case: accountability
5%
Removing dangerous capabilities
0%
Emergency protocol

Safety case: process

20%
Click to show details/rubric

Safety case: accountability

0%
Click to show details/rubric

Removing dangerous capabilities

5%
Click to show details/rubric

Emergency protocol

0%
Click to show details/rubric

Prep for extreme security 2%

0%
Plan for SL5
0%
Red-team resilience
10%
Practices: isolated network
0%
Practices: secure developers' machines
25%
Practices: multiparty controls
0%
Practices: secure boot
0%
Track record

Plan for SL5

0%
Click to show details/rubric

Red-team resilience

0%

Practices: isolated network

10%
Click to show details/rubric

Practices: secure developers' machines

0%
Click to show details/rubric

Practices: multiparty controls

25%
Click to show details/rubric

Practices: secure boot

0%
Click to show details/rubric

Track record

0%

Information sharing 35%

0%
Incident reporting
60%
Talk about extreme risks
0%
Describe worst-case outcome
75%
Don't publish some capabilities research

Incident reporting

0%
Click to show details/rubric

Talk about extreme risks

60%
Click to show details/rubric

Describe worst-case outcome

0%
Click to show details/rubric

Don't publish some capabilities research

75%
Click to show details/rubric

Planning 14%

25%
Safety plan
0%
Plan for how to use AGI
10%
Prepare for a pivot

Safety plan

25%
Click to show details/rubric

Plan for how to use AGI

0%
Click to show details/rubric

Prepare for a pivot

10%
Click to show details/rubric