An RSP specifies what level of AI capabilities an AI developer is prepared to handle safely with their current protective measures, and conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures improve. . . .
A good RSP should cover:
- Limits: which specific observations about dangerous capabilities would indicate that it is (or strongly might be) unsafe to continue scaling?
- Protections: what aspects of current protective measures are necessary to contain catastrophic risks?
- Evaluation: what are the procedures for promptly catching early warning signs of dangerous capability limits?
- Response: if dangerous capabilities go past the limits and it’s not possible to improve protections quickly, is the AI developer prepared to pause further capability improvements until protective measures are sufficiently improved, and treat any dangerous models with sufficient caution?
- Accountability: how does the AI developer ensure that the RSP’s commitments are executed as intended; that key stakeholders can verify that this is happening (or notice if it isn’t); that there are opportunities for third-party critique; and that changes to the RSP itself don’t happen in a rushed or opaque way?
Responsible Scaling Policies (METR 2023).
We like RSPs because it is good for labs to use good risk assessment, implement safety practices as a function of risk assessment results, and make public commitments about all of this to help other labs act safety too.
Our analysis relevant to RSPs is spread across “Risk assessment,” “Deployment,” “Security,” and “Training.”
Reading list:
- Responsible Scaling Policies and Key Components of an RSP (METR 2023)
- Common Elements of Frontier AI Safety Policies (METR 2024)
- Thoughts on responsible scaling policies and regulation (Christiano 2023)
- RSPs are pauses done right (Hubinger 2023)
- We’re Not Ready: thoughts on “pausing” and responsible scaling policies (Karnosfky 2023)
- What’s up with “Responsible Scaling Policies”? (Habryka and Greenblatt 2023)
- Responsible Scaling Policies Are Risk Management Done Wrong (Campos 2023)
Anthropic has an RSP. See their blogpost, policy, and comments.
Google DeepMind has an RSP, which they call their Frontier Safety Framework. It does not yet connect safety measures to risk assessment results.
OpenAI has an RSP, which they call their Preparedness Framework.