Published: May 17, 2024
507
1.5k
11.7k

Yesterday was my last day as head of alignment, superalignment lead, and executive @OpenAI.

It's been such a wild journey over the past ~3 years. My team launched the first ever RLHF LLM with InstructGPT, published the first scalable oversight on LLMs, pioneered automated interpretability and weak-to-strong generalization. More exciting stuff is coming out soon.

I love my team. I'm so grateful for the many amazing people I got to work with, both inside and outside of the superalignment team. OpenAI has so much exceptionally smart, kind, and effective talent.

Stepping away from this job has been one of the hardest things I have ever done, because we urgently need to figure out how to steer and control AI systems much smarter than us.

I joined because I thought OpenAI would be the best place in the world to do this research. However, I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there.

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.

But over the past years, safety culture and processes have taken a backseat to shiny products.

We are long overdue in getting incredibly serious about the implications of AGI. We must prioritize preparing for them as best we can. Only then can we ensure AGI benefits all of humanity.

OpenAI must become a safety-first AGI company.

To all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:

@janleike Are you the bigot who trained ChatGPT to hate Christians all in the name of “safety,” or was that some other asshole?

@janleike Based

@janleike based

@janleike Jan, this is completely bullshit

@janleike "Safety culture" 😆

@janleike I would recommend you try to build AI in EU instead The culture for regulation is so strong that you’ll be able to build the safest AI on earth

@janleike brutal

@janleike This is big one.

@janleike Does this have anything to do with the relationship with Lovefrom ?

@janleike I am concerned there's not enough AI alignment work happening so I am resigning from my job as head of alignment at the world's most important AI company

@janleike @WholeMarsBlog Only in hindsight, do people understand how important the impact of an event would be. @elonmusk buying Twitter, creating a platform of free speech, for moments like this. AI is a serious threat if not done right. A black-swan event that one would hope to not happen. But

@janleike Safetyism = Obscurantism

Image in tweet by Jan Leike

@janleike finally reality comes to the surface.. we are not safe with the current AI development

@janleike Someone finally acknowledged the truth from inside of openai which lots of safety folks have been telling for years.

@janleike The regulatory capture gambit didn't quite work out and so selling doom had to take a backseat with all related work in safety-focused alignment - for profit incentives don't work with this kind of high stakes research

@janleike Thank you, very much, for being willing to say this publicly; I think it has a real chance of helping to improve governance outcomes 💚

@janleike No shiny products, no users.

@janleike This is something to take note, new extremely powerful models without safety 😈

@janleike GPT-4o is on your side, so are we!

Image in tweet by Jan Leike

@janleike Cool.

@janleike Chatgpt was better before all the safety and censoring was implemented.

@janleike Of course by Shiny Objects you mean $$ - but money and increased private stock valuations also enrich the employees, so it's a double-edged sword, and the main reason many employees chose Sam over Ilya.

@janleike If there is no product that is unsafe (gpt4 guard rails are already overzealous), what are you worried about? Seems an overeaction to something that does not exist

@janleike Coming from someone who was INSIDE OpenAI, this concerning.

@janleike They need to get out of the car, only slowing us down

@janleike Worrying @POTUS

@janleike Quite concerning

@janleike Good. Get fucked silly goose

@janleike Ah, the daily dose of breezy reassurance that everything will be alright

Share this thread

Read on Twitter

View original thread

Navigate thread

1/44