AI Lab Watch

			*
Risk assessment	44%	29%	32%	1%	1%	1%	0%	28% weight
Scheming risk prevention	2%	8%	2%	2%	2%	2%	2%	21% weight
Boosting safety research	68%	56%	37%	28%	0%	15%	8%	13% weight
Misuse prevention	12%	4%	5%	0%	0%	0%	0%	12% weight
Prep for extreme security	2%	5%	0%	0%	0%	0%	0%	12% weight
Risk info sharing	35%	13%	32%	0%	28%	0%	0%	8% weight
Planning	14%	26%	0%	0%	0%	1%	0%	6% weight

Scheming risk prevention

21% weight

Boosting safety research

Prep for extreme security

Anthropic

Microsoft

Risk assessment 1%

Evals: domains, quality, elicitation

Evals: accountability

Adversarial evaluation for alignment

Model organisms

AI companies should do model evals and uplift experiments to determine whether models have dangerous capabilities or how close they are. They should also prepare to check whether models will act well in high-stakes situations.

Compare all on risk assessment

Evals: domains, quality, elicitation

Click to show details/rubric

Evals: accountability

Click to show details/rubric

Adversarial evaluation for alignment

Click to show details/rubric

Model organisms

Click to show details/rubric

Scheming risk prevention 2%

Safety case: process

Safety case: accountability

Internal deployment protocol

External deployment protocol

Plan for if an AI is caught scheming

Training: remove scheming capabilities

Training: adversarial training

50%

Training: safe architecture

AIs show signs that if they were more capable, they would sometimes scheme, i.e. fake alignment and subvert safety measures in order to gain power. AI companies should prepare for risks from models scheming, especially during internal deployment: if they can't reliably prevent scheming, they should prepare to catch some schemers and deploy potentially scheming models safely.

Compare all on scheming risk prevention

Safety case: process

Click to show details/rubric

Safety case: accountability

Click to show details/rubric

Internal deployment protocol

Click to show details/rubric

External deployment protocol

Click to show details/rubric

Plan for if an AI is caught scheming

Click to show details/rubric

Training: remove scheming capabilities

Click to show details/rubric

Training: adversarial training

Click to show details/rubric

Training: safe architecture

50%

Click to show details/rubric

Boosting safety research 15%

Publishing safety research

40%

Deep access for external safety researchers

Mentoring external safety researchers

AI companies should do (extreme-risk-focused) safety research, and they should publish it to boost safety at other AI companies. Additionally, they should assist external safety researchers by sharing deep model access and mentoring.

Compare all on boosting safety research

Publishing safety research

Click to show details/rubric

Deep access for external safety researchers

40%

Click to show details/rubric

Mentoring external safety researchers

Misuse prevention 0%

Safety case: process

Safety case: accountability

Removing dangerous capabilities

Emergency protocol

AI companies should prepare to prevent catastrophic misuse for deployments via API, once models are capable of enabling catastrophic harm.

Compare all on misuse prevention

Safety case: process

Click to show details/rubric

Safety case: accountability

Click to show details/rubric

Removing dangerous capabilities

Click to show details/rubric

Emergency protocol

Click to show details/rubric

Prep for extreme security 0%

Plan for SL5

Red-team resilience

Practices: isolated network

Practices: secure developers' machines

Practices: multiparty controls

Practices: secure boot

Track record

AI companies should prepare to protect model weights and code by the time AI massively boosts R&D, even from top-priority operations by the top cyber-capable institutions.

Compare all on prep for extreme security

Plan for SL5

Click to show details/rubric

Red-team resilience

Practices: isolated network

Click to show details/rubric

Practices: secure developers' machines

Click to show details/rubric

Practices: multiparty controls

Click to show details/rubric

Practices: secure boot

Click to show details/rubric

Track record

Compare all on risk info sharing

Planning 1%

Safety plan

Plan for how to use AGI

Prepare for a pivot

AI companies should plan for the possibility that dangerous capabilities appear soon and safety isn't easy: both for evaluating and improving safety of their systems and for using their systems to make the world safer.

Compare all on planning

Safety plan

Click to show details/rubric

Plan for how to use AGI

Click to show details/rubric

Prepare for a pivot

Click to show details/rubric

AI Lab Watch

Categories

Companies

Resources

Blog

About

Microsoft

Risk assessment 1%

Evals: domains, quality, elicitation

Evals: accountability

Adversarial evaluation for alignment

Model organisms

Scheming risk prevention 2%

Safety case: process

Safety case: accountability

Internal deployment protocol

External deployment protocol

Plan for if an AI is caught scheming

Training: remove scheming capabilities

Training: adversarial training

Training: safe architecture

Boosting safety research 15%

Publishing safety research

Deep access for external safety researchers

Mentoring external safety researchers

Misuse prevention 0%

Safety case: process

Safety case: accountability

Removing dangerous capabilities

Emergency protocol

Prep for extreme security 0%

Plan for SL5

Red-team resilience

Practices: isolated network

Practices: secure developers' machines

Practices: multiparty controls

Practices: secure boot

Track record

Risk info sharing 0%

Incident reporting

Talk about extreme risks

Describe worst-case outcome

Don't publish some capabilities research

Planning 1%

Safety plan

Plan for how to use AGI

Prepare for a pivot