Gary Marcus's Twitter Thread

An MIT professor just asked me, how can you be so confident that OpenAI isn’t very close to AGI? Here is a thread of links to a several recent observations that I think show how far we are from robust intelligence. Importantly, all of them are *known* problems, many that I

1. The problem of distribution shift, well-tested in Apple’s 2024 reasoning paper, which goes back to my own 1998 work, continues. LLMs, like their predecessors generalize well to similar items but struggle with reliability in unfamiliar regimes, even minor variations sometimes

2. A recent paper on Putnam math problems showed a similar failure on distribution shift, showing 30% decement by o1 on math problems with minor variations eg on variable names. Same old story, different set of problems. https://openreview.net/forum?i...

3. Commonsense reasoning is still dodgy, as it has long been, review here, earlier today: https://open.substack.com/pub/...

4. o1 probably isn’t as robust as people might have hoped. Even on certain benchmarks, results may be fragile or difficult to replicate: https://x.com/Alex_Cuadron/sta...

5. In o1’s best case, there may have been haeavy data augmentation. In domains where that’s not possible, o1 may not even be that much better than GPT-4 (e.g on some language tasks), as you can see if you read OpenAI’s reports carefully: https://openai.com/index/learn...

6. The presentation of o3 was problematic, as discussed below, and no clean test of out-of-domain generalization was provided, suggesting that we may see the kinds of problems with distribution shift as ever, outside of semi-closed domains where synthetic data can readily be

7. Many leading figures in the field have acknowledged that we may have reached a period of diminishing returns of pure LLM scaling, much as I anticipated in 2022. It’s anybody’s guess what happens next. [Note also that the current idea, test-time “scaling”, is very expensive,

8. As I noted in 2001, the lack of distinct, accessible, reliable database-style records leads to hallucinations. Despite many promises to the contrary, this still routinely leads to inaccurate news summaries, defamation, fictional sources, incorrect advice, and unreliability.

@GaryMarcus We know we aren't close simply because Sam Altman says we are

@GaryMarcus Sam Altman lacks the technical expertise to make credible claims, let alone understand how transformers work. A figure with no genuine qualifications in this area does not merit your attention or trust. For those interested, ample evidence is readily available to explore.

@GaryMarcus Hubert Dreyfus discusses the issue that keeps AI from ever becoming intelligent. Start at 28:48 to get into the crux. https://youtu.be/FqRybX2pSDU?s...

@GaryMarcus Thank you for this. FFS companies think AI is more advanced than it is because of the hype and want to start implementing it in critical systems when it's not ready for prime time. Forget AGI, can we solve the hallucination problem? When my doctor implements AI I don't want it

@GaryMarcus Gary, how much better do LLM's need to get before you say you were wrong?

@GaryMarcus What if there doesn't need to be *principled* solutions? And prosaic AGI is just a matter of scaling a bit further? (This seems likely to me). ARC AGI falling to o3 is evidence of this.

@gcolbourn Do me the kindness of reading the full thread and links; o3 and ARC were addressed.

@GaryMarcus What do you need to see in order to accept that you were wrong?

@GaryMarcus Mr. Marcus, I wish you were wrong. I wish the AI hype were entirely true, and we were on the verge of creating systems as intelligent and reliable as we dream them to be. As a huge fan of AI advancements, it’s tough to admit, but the more I hear you, the more I have to agree with

@GaryMarcus I think the simplest evidence is the fact that people can generalize off of extremely minimal input. Modern LLMs need large amounts of training data to function and they still often make mistakes that humans wouldn't with significantly less input.

@GaryMarcus

@GaryMarcus A fascinating question from the MIT professor! The gap between current AI capabilities and AGI remains vast, as many fundamental challenges are still unresolved. 🚧 Here’s a breakdown of key points often cited to highlight the distance from robust intelligence: 1️⃣ Lack of Common

@GaryMarcus Verses Genius AI agent just outperforms OpenAI and I don't think they are near to anything...

@GaryMarcus Openai's technology roadmap is problematic . RL is the last weapon they have but RL can not really beneficial to science discovery or we might have implemented it long time ago . RL is not the right path to AGI , it's just one of facilitators .

@GaryMarcus Thanks, Gary! Please keep your finger on the pulse of new developments & let us know where things stand! I HOPE you are correct. Even if we don't reach AGI within the next 5 years, the power & proliferation will lead to seismic changes in our already crazy world.

@GaryMarcus Try having an LLM play a recent NYTimes Connections game. It is all language and meaning, so LLMs should do okay, but in my experience they fail miserably.

@GaryMarcus the famous correct predictions

@GaryMarcus This is a likely phenomenon in most specialized fields: a problem that any first year undergraduate student could solve, the AI program would get wrong:

@GaryMarcus You very clearly thought and shouted that we won't get where we are either at this point. There's no point to trust your current disbelief when you make up your mind regardless of what happens.

@GaryMarcus I noticed that LLMs often go for the complete opposite answer. In terms of language pieces, the contraries are a lot closer than what it should be in reasoning. It's like the floating point least significant bit weights as much as the sign bit.

@GaryMarcus Can’t argue against this. Yet there are so much hype as if ASI will come right around corner. I won’t how it will all end.

@GaryMarcus what a clown

Share this thread

Read on Twitter

Navigate thread