AI Lab Watch

Categories

Companies

Resources

Blog

About

			*
Weighted score	28%	20%	18%	4%	4%	3%	1%
Risk assessment	44%	29%	34%	1%	6%	1%	0%	27% weight
Scheming risk prevention	4%	8%	3%	2%	2%	2%	2%	21% weight
Boosting safety research	70%	52%	35%	25%	0%	13%	8%	14% weight
Misuse prevention	14%	4%	9%	0%	1%	0%	0%	12% weight
Prep for extreme security	3%	5%	0%	0%	0%	0%	0%	12% weight
Risk info sharing	35%	13%	32%	0%	28%	0%	0%	8% weight
Planning	14%	26%	0%	0%	0%	1%	0%	6% weight

Up to date as of September 15

Overall score

Risk assessment

Evals: domains, quality, elicitation

Weighted 55% of category

	50%	Anthropic does model evals for dangerous capabilities; some of the evals seem good, but the elicitation quality is unclear and it's generally very unclear why Anthropic thinks they help rule out dangerous capabilities more
	50%	DeepMind does model evals for dangerous capabilities; the evals themselves seem fine, but it generally doesn't report enough details or explain how it interprets results more
	50%	OpenAI does model evals for dangerous capabilities; the evals themselves are fine, but the elicitation quality is unclear, and OpenAI's interpretation of the results is confusing, especially for bio more
	2%	Meta does some evals for cyber and chem/bio capabilities — but it doesn't share much information, it's not clear whether the evals are good, and its elicitation is very poor more
	10%	xAI does some basic evals for chem, bio, and cyber capabilities as well as some honesty-related propensities. They're pretty bad. more
	2%	Microsoft is planning to do evals some dangerous capabilities, but its plan is very vague more
	0%	No, DeepSeek doesn't seem to have done evals for dangerous capabilities or have a plan

Evals: accountability

Weighted 25% of category

	15%	No accountability for internal evals; some external evals but details unclear more
	1%	No accountability more
	20%	No accountability for internal evals; some external evals including METR evals with good access more
	0%	No accountability
	0%	No evals or accountability plan
	0%	No evals or accountability plan
	0%	No evals or plan

Adversarial evaluation for alignment

Weighted 10% of category

	50%	Anthropic is working on alignment audits more
	0%	DeepMind doesn't do this more
	10%	OpenAI doesn't do this more
	0%	Meta doesn't do this
	0%	xAI doesn't do this
	0%	Microsoft doesn't do this
	0%	DeepSeek doesn't do this

Model organisms

Weighted 10% of category

	75%	Anthropic does model organisms research and related "alignment auditing" research more
	10%	DeepMind pubishes a little research related to model organisms more
	0%	OpenAI doesn't do anything on this
	0%	Meta doesn't do anything on this
	0%	xAI doesn't do anything on this
	0%	Microsoft doesn't do anything on this
	0%	DeepSeek doesn't do anything on this