Blog

Subscribe on Substack

November 4, 2024

The current state of RSPs

October 21, 2024

What AI companies should do

Some rough ideas

October 15, 2024

Anthropic rewrote its RSP

Some reactions

September 23, 2024

Model evals for dangerous capabilities

Testing an LM system for dangerous capabilities is crucial for assessing its risks

July 27, 2024

Safety consultations for AI lab employees

July 10, 2024

New page: Integrity

And new-ish page: Policy advocacy

June 12, 2024

Anthropic's Certificate of Incorporation

New details on the Long-Term Benefit Trust, but most questions remain

May 29, 2024

AI companies' commitments

May 27, 2024

Maybe Anthropic's Long-Term Benefit Trust is powerless

Anthropic should share the details

May 24, 2024

AI companies aren't really using external evaluators

But they should

May 21, 2024

New voluntary commitments (AI Seoul Summit)

16 companies commit to make RSPs

May 17, 2024

DeepMind's "Frontier Safety Framework" is weak and unambitious

But they are doing model evals for dangerous capabilities

AI Lab Watch

Categories

Companies

Resources

Blog

About

Blog

The current state of RSPs

What AI companies should do

Anthropic rewrote its RSP

Model evals for dangerous capabilities

Safety consultations for AI lab employees

New page: Integrity

Anthropic's Certificate of Incorporation

AI companies' commitments

Maybe Anthropic's Long-Term Benefit Trust is powerless

AI companies aren't really using external evaluators

New voluntary commitments (AI Seoul Summit)

DeepMind's "Frontier Safety Framework" is weak and unambitious

AI Lab Watch

Categories

Companies

Resources

Blog

About

The current state of RSPs

What AI companies should do

Anthropic rewrote its RSP

Model evals for dangerous capabilities

Safety consultations for AI lab employees

New page: Integrity

Anthropic's Certificate of Incorporation

AI companies' commitments

Maybe Anthropic's Long-Term Benefit Trust is powerless

AI companies aren't really using external evaluators

New voluntary commitments (AI Seoul Summit)

DeepMind's "​​Frontier Safety Framework" is weak and unambitious

DeepMind's "Frontier Safety Framework" is weak and unambitious