AI Lab Watch

Categories

Companies

Resources

Blog

About

			*
Weighted score	27%	21%	17%	4%	3%	3%	1%
Risk assessment	44%	29%	32%	1%	1%	1%	0%	28% weight
Scheming risk prevention	2%	8%	2%	2%	2%	2%	2%	21% weight
Boosting safety research	68%	56%	37%	28%	0%	15%	8%	13% weight
Misuse prevention	12%	4%	5%	0%	0%	0%	0%	12% weight
Prep for extreme security	2%	5%	0%	0%	0%	0%	0%	12% weight
Risk info sharing	35%	13%	32%	0%	28%	0%	0%	8% weight
Planning	14%	26%	0%	0%	0%	1%	0%	6% weight

Up to date as of July 6

Overall score

Risk assessment

Evals: domains, quality, elicitation

Weighted 55% of category

	50%	Anthropic does model evals for dangerous capabilities; some of the evals seem good, but the elicitation quality is unclear and it's generally very unclear why Anthropic thinks they help rule out dangerous capabilities more
	50%	DeepMind does model evals for dangerous capabilities; the evals themselves seem fine, but it generally doesn't report enough details or explain how it interprets results more
	50%	OpenAI does model evals for dangerous capabilities; the evals themselves are fine, but the elicitation quality is unclear, and OpenAI's interpretation of the results is confusing, especially for bio more
	2%	Meta does some evals for cyber and chem/bio capabilities — but it doesn't share much information, it's not clear whether the evals are good, and its elicitation is very poor more
	2%	xAI is planning to do a cyber capability eval and some simple bio capability evals, but no uplift experiments, AI R&D capability evals, or scheming evals, and it doesn't have an elicitation plan more
	2%	Microsoft is planning to do evals some dangerous capabilities, but its plan is very vague more
	0%	No, DeepSeek doesn't seem to have done evals for dangerous capabilities or have a plan

Evals: accountability

Weighted 25% of category

	15%	No accountability for internal evals; some external evals but details unclear more
	1%	No accountability more
	15%	No accountability for internal evals; some external evals but details unclear more
	0%	No accountability
	0%	No evals or accountability plan
	0%	No evals or accountability plan
	0%	No evals or plan

Adversarial evaluation for alignment

Weighted 10% of category

	50%	Anthropic is working on alignment audits more
	0%	DeepMind doesn't do this more
	10%	OpenAI doesn't do this more
	0%	Meta doesn't do this
	0%	xAI doesn't do this
	0%	Microsoft doesn't do this
	0%	DeepSeek doesn't do this

Model organisms

Weighted 10% of category

	75%	Anthropic does model organisms research and related "alignment auditing" research more
	10%	DeepMind pubishes a little research related to model organisms more
	0%	OpenAI doesn't do anything on this
	0%	Meta doesn't do anything on this
	0%	xAI doesn't do anything on this
	0%	Microsoft doesn't do anything on this
	0%	DeepSeek doesn't do anything on this