60% | Anthropic does model evals for dangerous capabilities; they're fine, but the elicitation quality is unclear (and it lacks scheming evals and difficult bio evals) more | |
60% | DeepMind does model evals for dangerous capabilities; they're fine, but the elicitation is likely poor (and the bio evals are too easy) more | |
70% | OpenAI does model evals for dangerous capabilities; they're fine, but the elicitation quality is unclear (and it lacks difficult bio evals) more | |
2% | Meta does some evals for cyber and chem/bio capabilities — but it doesn't share much information, it's not clear whether the evals are good, and its elicitation is very poor more | |
2% | xAI is planning to do a cyber capability eval and some simple bio capability evals, but no uplift experiments, AI R&D capability evals, or scheming evals, and it doesn't have an elicitation plan more | |
2% | Microsoft is planning to do evals some dangerous capabilities, but its plan is very vague more | |
0% | No, DeepSeek doesn't seem to have done evals for dangerous capabilities or have a plan |
75% | Anthropic does model organisms research and related "alignment auditing" research more | |
10% | DeepMind pubishes a little research related to model organisms more | |
0% | OpenAI doesn't do anything on this | |
0% | Meta doesn't do anything on this | |
0% | xAI doesn't do anything on this | |
0% | Microsoft doesn't do anything on this | |
0% | DeepSeek doesn't do anything on this |