Responsible scaling policies

An RSP specifies what level of AI capabilities an AI developer is prepared to handle safely with their current protective measures, and conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures improve. . . .

A good RSP should cover:

  • Limits: which specific observations about dangerous capabilities would indicate that it is (or strongly might be) unsafe to continue scaling?
  • Protections: what aspects of current protective measures are necessary to contain catastrophic risks?
  • Evaluation: what are the procedures for promptly catching early warning signs of dangerous capability limits?
  • Response: if dangerous capabilities go past the limits and it’s not possible to improve protections quickly, is the AI developer prepared to pause further capability improvements until protective measures are sufficiently improved, and treat any dangerous models with sufficient caution?
  • Accountability: how does the AI developer ensure that the RSP’s commitments are executed as intended; that key stakeholders can verify that this is happening (or notice if it isn’t); that there are opportunities for third-party critique; and that changes to the RSP itself don’t happen in a rushed or opaque way?

Responsible Scaling Policies (METR 2023).

We like RSPs because it is good for labs to use good risk assessment, implement safety practices as a function of risk assessment results, and make public commitments about all of this to help other labs act safety too.

Our analysis relevant to RSPs is spread across “Risk assessment,” “Deployment,” “Security,” and “Training.”

Reading list:


Anthropic has an RSP. See their blogpost, policy, and comments.

Google DeepMind has an RSP, which they call their Frontier Safety Framework. It does not yet connect safety measures to risk assessment results.

OpenAI is developing an RSP, which they call their Preparedness Framework. They say they’ve adopted it but they are clearing not yet in compliance with it.