50% | Anthropic does model evals for dangerous capabilities; some of the evals seem good, but the elicitation quality is unclear and it's generally very unclear why Anthropic thinks they help rule out dangerous capabilities more | |
50% | DeepMind does model evals for dangerous capabilities; the evals themselves seem fine (perhaps too easy), but the elicitation is likely poor and it generally doesn't report enough details or explain how it interprets results more | |
50% | OpenAI does model evals for dangerous capabilities; the evals themselves are fine, but the elicitation quality is unclear, and OpenAI's interpretation of the results is confusing, especially for bio more | |
2% | Meta does some evals for cyber and chem/bio capabilities — but it doesn't share much information, it's not clear whether the evals are good, and its elicitation is very poor more | |
2% | xAI is planning to do a cyber capability eval and some simple bio capability evals, but no uplift experiments, AI R&D capability evals, or scheming evals, and it doesn't have an elicitation plan more | |
2% | Microsoft is planning to do evals some dangerous capabilities, but its plan is very vague more | |
0% | No, DeepSeek doesn't seem to have done evals for dangerous capabilities or have a plan |
75% | Anthropic does model organisms research and related "alignment auditing" research more | |
10% | DeepMind pubishes a little research related to model organisms more | |
0% | OpenAI doesn't do anything on this | |
0% | Meta doesn't do anything on this | |
0% | xAI doesn't do anything on this | |
0% | Microsoft doesn't do anything on this | |
0% | DeepSeek doesn't do anything on this |