Frontier AI labs should have a governance structure and processes to promote safety and help make important decisions well, especially decisions about safety practices and model training and deployment. Mostly this is an open problem that the labs should solve. Possible desiderata include:
- The lab should have a mandate for safety and benefit-sharing
- An independent board (with a safety and benefit-sharing mandate) should exercise effective oversight
- Power should be held by people without profit incentives
- The lab should promote safety culture internally
- The lab should encourage its staff to share safety concerns
What labs should do
How should AI labs be structured and what organizational practices should they adopt to promote safety? This is a hard question. Moreover, it can be hard to verify that good-sounding policies are actually effective.
Maybe labs’ organizational structure is important. It seems good for a lab to be controlled by a nonprofit or be a public benefit corporation, and for profit-motivated investors to have minimal control over the lab’s behavior.
A lab’s primary mandate should prioritize safety, avoiding catastrophic risk, and achieving shared long-term benefit from powerful AI.
Maybe labs should have a “board risk committee,” a “chief risk officer,” “internal review boards,” and more.1 Maybe AGI labs need an internal audit function.
We do not know how labs can improve their safety culture. Labs should have a good process for staff to raise concerns about safety; we don’t know what that process should look like.
Labs should ensure that their board members prioritize safety and security, and should empower their boards to track progress on safety and security.
It seems good for leadership and staff to have little or ideally no equity in the lab. But most of a lab’s leadership’s self-centered incentives come from power rather than money, so a leader may prioritize their own lab’s success even with no equity.
Labs should plan for the possibility of pausing for safety, and in particular financially prepare for a pause and plan for what their staff would work on during a pause.
Labs increasing transparency about how safe they are would incentivize them to be safer, help observers distinguish responsible from irresponsible labs, incentivize other labs to be more transparent and safer, and help them coordinate with other labs. Labs should support/facilitate/encourage whistleblowing about dishonest, reckless, or ruthless behavior as well as practices or products that violate regulations, standards, or industry best practice. Labs should have an ombuds process to help staff investigate, resolve, or amplify concerns. I don’t know what that should look like. We don’t know what the details should be. Ambitiously, labs should adopt a strong transparency mechanism based on staff flagging false statements. See Honest organizations (Christiano 2018). Cryptographic mechanisms such as ring signatures could enable provably anonymous whistleblowing, but this is not necessary.
Perhaps a group that understands AI risk should be able to veto access to model weights and other critical IP, legally and in practice.2
On personnel, a lab should select for people who prioritize safety and avoiding extreme risks, and it should explain to staff that it makes tradeoffs for safety like not publishing most research, pausing if necessary, and potentially helping another lab build powerful AI safely rather than pursuing it itself.
Labs should have a good process for revising safety commitments and practices, including some independent oversight/review. This is currently evaluated as “Revising policies” and “Elicit external review of risk assessment practices and commitments” in “Risk assessment.”
Evaluation
Organizational structure (30%): the lab is structured in a way that enables it to prioritize safety in key decisions, legally and practically.
- (20/40) The lab and its leadership have a mandate for safety and benefit-sharing and have no overriding legal duty to create shareholder value.
- (1/40) There is a board with ultimate formal power, and its main mandate is for safety and benefit-sharing.
- (1/40) … and it actually provides effective oversight.
- (5/40) … and it is independent (i.e. its members have no connection to the company or profit incentive) (full credit for fully independent, no credit for half independent, partial credit for in between).
- (1/40) … and the organization keeps it well-informed.
- (2/40) … and it has formal powers related to risk assessment, training, or deployment decisions.
- (10/40) Investors/shareholders have no formal power.
Planning for pause (15%). Have a plan for the details of what the lab would do if it needed to pause for safety and publish relevant details. In particular, explain what the lab’s capabilities researchers would work on during a pause, and say that the lab stays financially prepared for a one-year pause. Note that in this context “pause” can include pausing development of dangerous capabilities and internal deployment, not just public releases.
Leadership incentives (5%): the lab’s leadership is incentivized in a way that helps it prioritize safety in key decisions.
- (1/2) the CEO should have no equity (or other financial interest in the company).
- (1/2) other executives should have no equity.
Ombuds/whistleblowing/trustworthiness (50%):
- (3/5) Have a reasonable process for staff to escalate concerns about safety. If this process involves an ombud, the ombud should transparently be independent or safety-focused. Evaluated holistically.
- (2/5) Promise that the lab does not use non-disparagement agreements (nor otherwise discourage current or past staff or board members from talking candidly about their impressions of and experiences with the lab).
Sources
Safety culture:
- Building a Culture of Safety for AI (Manheim 2023) (blogpost).
- “Managing Safety and the Safety Culture” (chapter 13) in “Engineering a Safer World” (Leveson 2009), summarized in “Organizational safety” in “Alignment Newsletter #112” (Shah 2020).
Three lines of defense against risks from AI (Schuett 2022), especially §3–4.
AGI labs need an internal audit function (Schuett 2023).
How to Design an AI Ethics Board (Schuett et al. 2023).
Towards best practices in AGI safety and governance (Schuett et al. 2023):
- “Dual control. Critical decisions in model development and deployment should be made by at least two people (e.g. promotion to production, changes to training datasets, or modifications to production).”
- “Board risk committee. AGI labs should have a board risk committee, i.e. a permanent committee within the board of directors which oversees the lab’s risk management practices.”
- “Chief risk officer. AGI labs should have a chief risk officer (CRO), i.e. a senior executive who is responsible for risk management.”
- “Statement about governance structure. AGI labs should make public statements about how they make high-stakes decisions regarding model development and deployment.”
- “Third-party governance audits. AGI labs should commission third-party audits of their governance structures.”
- “Internal audit. AGI labs should have an internal audit team, i.e. a team which assesses the effectiveness of the lab’s risk management practices. This team must be organizationally independent from senior management and report directly to the board of directors.”
- “AGI labs should have independent safety and ethics advisory boards to help with certain decisions.”
- “AGI labs should have internal review boards.”
- “AGI labs should be set up such that their governance structures permit them to tradeoff profits with societal benefit.”
- “AGI labs should report to an international non-governmental organization (INGO) that is publicly committed to human rights and democratic values.”
- “AGI labs should have an independent board of directors with technical AI safety expertise who have the mandate to put the benefits for society above profit and shareholder value.”
“Trustworthy command” in “Six Dimensions of Operational Adequacy in AGI Projects” (Yudkowsky 2022):
Improving: There’s a legal contract which says that the Board doesn’t control the IP and that the alignment-aware research silo does.
Adequate: The entire command structure including all members of the finally governing Board are fully aware of the difficulty and danger of alignment. The Board will not object if the technical leadership have disk-erasure measures ready in case the Board suddenly decides to try to open-source the AI anyway.
What labs are doing
Microsoft: a for-profit company. It has no governance practices to reduce extreme risk. See “Operationalizing Responsible AI at Microsoft” in “Governing AI” (Microsoft 2023), Reflecting on our responsible AI program (Microsoft 2023), and Microsoft’s Responsible AI page.
Google DeepMind: part of Google, a for-profit company. We do not understand its relationship with Google; it may have slightly more independence than a normal subsidiary. In 2021, Google disallowed DeepMind from pursuing more independent governance.3 We believe Google could take Google DeepMind’s systems or replace its leadership.
See DeepMind’s About page, DeepMind’s old Operating Principles, Google’s AI Principles, and Google’s Responsible AI practices. Google has an “Advanced Technology Review Council,” but it does not seem to be set up to responsibly supervise the development or deployment of powerful AI systems. Google’s reports on their approach to responsible AI and page on AI governance don’t mention strong governance mechanisms for AI safety. See also “Introduction” and “Internal Governance and Operations” in 2022 AI Principles Progress Update (Google 2023).
Meta AI: part of Meta, a for-profit company. It has no governance practices to reduce extreme risk. See Responsible AI (Meta AI) and Facebook’s five pillars of Responsible AI (Meta AI 2021).
OpenAI: OpenAI has an unusual structure: a nonprofit controls a capped profit company. It’s not clear how the cap works or where it is set. In 2019 they said “Returns for our first round of investors are capped at 100x their investment.” In 2021 Altman said the cap was “single digits now.” But reportedly the cap will increase by 20% per year starting in 2025 (The Information 2023; The Economist 2023).
OpenAI has a strong partnership with Microsoft. The details are opaque. It seems that OpenAI is required to share its models (and some other IP) with Microsoft until OpenAI attains “a highly autonomous system that outperforms humans at most economically valuable work.” This is concerning because AI systems could cause a catastrophe before reaching that threshold. Moreover, OpenAI may substantially depend on Microsoft; in particular, Microsoft Azure is “OpenAI’s exclusive cloud provider.” Microsoft’s power over OpenAI may make it harder for OpenAI to make tradeoffs that prioritize safety or to keep dangerous systems from Microsoft.
OpenAI has 9 board members: Sam Altman, Bret Taylor, Adam D’Angelo, Larry Summers, Sue Desmond-Hellmann, Nicole Seligman, Fidji Simo, Paul M. Nakasone, and Zico Kolter.
D’Angelo has no equity in OpenAI; presumably the other board members (besides Altman) don’t either.4 This seems slightly good for improving incentives but most of an AI lab’s leadership’s incentive for their lab to succeed comes not from their equity but from the power they gain if their lab succeeds. The former board removed CEO Sam Altman and President Greg Brockman from the board and removed Altman from OpenAI. (Altman soon returned.) The move by Sutskever, D’Angelo, McCauley, and Toner seems to have been precipitated by some combination of
- Sutskever5 and other OpenAI executives telling them that Altman often lied
- Altman dishonestly attempting to remove Toner from the board (over the pretext that her coauthored paper Decoding Intentions was too critical of OpenAI, plus allegedly falsely telling board members that McCauley wanted Toner removed).6
OpenAI had WilmerHale, a law firm, review the event. The report was not released and the review tentatively seems like it was designed to find nothing.
As of 2023, Sam Altman has no direct equity in OpenAI. However, he has unusual other financial conflicts of interest: he owns the OpenAI VC fund (but this may just be an accident) and he seems to have made personal investments in AI chip companies and committed that OpenAI would buy from them [citation needed]. OpenAI and Altman have not been transparent about this.
OpenAI announced in March 2024 that it is “Adopting a new set of corporate governance guidelines” and “Creating a whistleblower hotline to serve as an anonymous reporting resource for all OpenAI employees and contractors.” It did not elaborate.
We don’t know what fiduciary duty OpenAI has to its investors.
The OpenAI Charter mentions “Broadly distributed benefits” and “Long-term safety,” including the ‘stop and assist clause.’
OpenAI announced that it is “Creating a whistleblower hotline to serve as an anonymous reporting resource for all OpenAI employees and contractors.” But the details are unclear.
It tentatively seems that public dissent is seen as disloyal at OpenAI.
Anthropic: Anthropic is a public benefit corporation; we do not know exactly how that matters. We don’t know what fiduciary duty it has to its investors. Its mission is to “responsibly develop and maintain advanced AI for the long-term benefit of humanity.”
Anthropic’s Long-Term Benefit Trust is an independent7 body that can elect a majority of board members. The initial members of the Trust seemed like good choices for AI safety, but the two AI safety experts left due to conflicts of interest and haven’t been replaced. The Trust’s power is limited: its only formal power is electing board members, which occurs insufficiently frequently to exert much control, and investors can remove that power. Important aspects of the trust are opaque: what determines the number of board members it can elect, the “process for amendment,” and the “series of ‘failsafe’ provisions that allow changes to the Trust and its powers without the consent of the Trustees”. But the Trust seems better than nothing.
The board and LTBT receive regular updates on dangerous capability evaluations and implementation of safety measures, and changes to Anthropic’s Responsible Scaling Policy require approval from the board in consultation with the LTBT.
Last we heard, Anthropic’s board consisted of CEO Dario Amodei, President Daniela Amodei, Luke Muehlhauser (representing shareholders but maybe philanthropic ones), and Yasmin Razavi (representing shareholders).
Google (Feb 2023, Oct 2023) and Amazon are major investors in Anthropic; this probably doesn’t matter much but Anthropic is not transparent about its stockholders’ stakes or powers (in particular over its Trust).
On whistleblowing, Anthropic committed to “Implement a non-compliance reporting policy.”
-
Towards best practices in AGI safety and governance (Schuett et al. 2023). ↩
-
See “Trustworthy command” in “Six Dimensions of Operational Adequacy in AGI Projects” (Yudkowsky 2022). ↩
-
See Our structure (OpenAI 2023). Altman has no direct equity; he has an unspecified “small” (and capped) amount of indirect equity. ↩
-
Its members have no equity in Anthropic and Anthropic says they receive only modest compensation. ↩