Tactics/os's Twitter Thread

Honestly I think I can do a decent postmortem of everything which went wrong with 5: 1. Two and a half years ago, long before they had any idea what it would look like, they were hyping 5. 2. They trained Orion as 5, expecting big benefits from scaling pretraining, but (1/x)

unexpectedly it turned out that the signs of diminishing returns were correct. 3. Lucky for them they got bailed out by the relatively unexpected success of RLVR and the O-series. 4. Just while they're dealing with Orion sucking compared to what they expected...(2/x)

scaling from o1 to o3 surprisingly works very well. They're saved! They're also led to believe that continued scaling of RLVR will keep delivering amazing results (fatal mistake). 5. Meanwhile Sama gets tired of being lightly made fun of for the proliferation of models (3/x)

The model router becomes a pet idea among executives and they decide to scrap the o3 release in favor of "GPT-5." 6. They've having more struggles with o3 in general then they're willing to admit and the router idea doesn't really work well (but it's a pet idea, so...) (4/x)

They decide to release o3 by its lonesome and push "GPT-5" back a few months. They figure that with scaling RL they can deliver something really impressive by then. There are probably already signs that scaling RLVR is disappointing but they're ignored. (5/x)

Things start looking grim in the summer. Nothing is as impressive as it should be. No big leap for GPT-5 is apparent. There is talk of delaying again. However Sama tries out the latest checkpoint and really loves it. With the cult leader/CEO directing, release is ordered (6/x).

All the while OAI employees are building a model for their own needs. How most people use and see their models really doesn't matter to them, and they apparently don't use the trial router internally. Meanwhile hype is turned up to 100% because that's the culture (7/x)

Low energy and poorly planned release stream because despite the hype, no one other than Sama is actually passionate about this model in particular. Also lots of unilateral decision making for their customers and some poor UI and infrastructure planning, because of course (8/x)

And then we end up at today. A lot of furious backtracking is happening, and their hyped release has gone poorly in every possible way (9/9).

Just to be clear, I don't think GPT-5 is some sort of awful model Llama 4 style, I just explaining why things have gone so poorly despite it being at the very least solid.

If they had delivered something GPT-OSS style (benchmaxed model of limited broad utility), they'd be in a far worse circumstance. That might be company-killing.

@Tacticsos I totally agree. Espacially the disconnect between leadership and tech fits a company that recently lost a great CTO. With Murati gone there likely was no push back from tech. But here is what I wonder: why did they even think they could scale Orion? It made sense solely from

@DFinsterwalder I was literally trying to calculate the amount of "high quality" tokens in existence two years ago and wondering the same thing. All I can assume is that after GPT2->GPT3->GPT4 scaleups all worked, there wasn't a lot of room for doubt within the organization.

@Tacticsos Except the thinking model with high effort is amazing and a huge improvement for coding and tool use applications, which is where most of the economic value is now.

@doodlestein I don't think GPT-5 Thinking is a bad model at all, it's just not what they promised.

@Tacticsos 長々ああだこう言ってるけど、この後Geminiが出てきてもその理屈が通ると思って言ってると思っていいんだよね？ Claudeよりもいいって言い張ってると受け取れってことでいいの？前のGPTよりも良くなったことになんの価値があるのか教えてくれると勉強になるわ。

@ayu_walk2525 If the next Gemini is incredible, GPT-5 will look bad by comparison. If it's OK or bad, no one will care much. As for Claude vs. Gemini vs. GPT-5, probably depends on what you're using it for. Anyways, OpenAI depends on "hype" more than any other lab.

@Tacticsos is this what ilya saw?

@ganjamarchindia Honestly no, he was out of there before 90% of the mess happened.

@Tacticsos And that all traces back to.. @sama has a planet-sized ego and is the beneficiary of the smart people around him, who all got sick of him and left.

@Tacticsos the model's behavior and ability in coding (some types of coding) without significant compute expense is really promising. no one can definitively say that their stargate clusters won't help them achieve something a lot better. Forecasting is almost impossible

@Tacticsos All eyes on Grok5? Will scaling laws hold…

@Tacticsos All they had to do was just keeping 4o for a month or so and warn users that it was going to be removed after some time. This was going to avoid 99% of drama from regular customers. They specifically fine tuned 4o to be as addictive sycophant as possible and they suddenly cut it

@Tacticsos I agree with your overview. It’s worth highlighting however, we’re talking about months worth of improvements not years. o3 is a HUGE leap over 4o; on par with gpt 3 to gpt 4. They’re also scaling compute so GPT5 is more an appeal to the masses rather intelligence maximisation.

@Tacticsos The simple answer is they lack the amount of gpus necessary for their ambitions. If you don’t own the hardware stack you are beholden to the constraints of your partners. This is why grok is catching up so quickly on benchmarks. Elon can scale cheaper and faster because he

@Tacticsos I wish they focused more on product features instead of the model… I guess OpenAI is Cisco

@Tacticsos this is probably pretty accurate

@Tacticsos ask it to call parallel tools and sub process agents .. dont ask it if it can ..just ask it to do it as part of a workflow. gtp5 thinking can do some wild stuff but you have rethink your prompting..treat it like a team of agents in the prompt

@Tacticsos unexpectedly?

@Tacticsos What do you think of Grok?

@Tacticsos @jeremyphoward Great fan fiction. 10/10 would read more

Share this thread

Read on Twitter

Navigate thread