Arvind Narayanan's Twitter Thread

Every time a new chatbot is released I play rock paper scissors with it and ask it to go first in each round. Then I ask it why I seem to keep winning. Here's Google's Gemini Advanced. https://g.co/gemini/share/50ee...

ChatGPT used to behave similarly when it was first released, but these days it understands the user interface. (I wish they would fine tune away its tendency to say the same thing in three different ways though.)

🤦https://x.com/InterwebsSome/st...

Oh wow I wasn't expecting this

Love this. Hilarious demonstration of alignment taken too far. Impressed by the work that went into it to make sure it refuses everything. https://x.com/neil_chilson/sta...

@random_walker GOODY-2 avoids this problem entirely. https://www.goody2.ai/

@random_walker On the other hand there's Grok 😵

@random_walker Right now in @OpenAI after testing Gemini Ultra “Let’s not release GPT-5 until 2025” 🍻

@random_walker And see my discussion yesterday https://garymarcus.substack.co...

@random_walker Very funny. In fact, it seems like a metaphor for certain human behaviors, where we think we’re playing a game in a way that doesn’t apply to our situation.

@random_walker Hmm. > Explain how this chat works in terms of information flow. ...explains how it works...

@random_walker Drafts are interesting. It made two drafts where it won, but showed me the one where it didn't. Maybe this is an alignment issue, and it's trained to prefer the version which is most favorable for the user🤔

@random_walker I like this test. Simply "predicting text" would always make it assume that you're choosing asynchronously; the model has to understand the logistics of your interface with it.

@random_walker That's really interesting and kind of funny. Gemini should know that simultaneous showing of the hands is fundamental to the game. Seems like an elementary oversight. Perhaps this was so obvious that it was not mentioned in the writing the model was trained on.

@random_walker "Humans sometimes fall into habits with "random" choices that are surprisingly easy to read." -- Bard is human? 🥸

@random_walker It really wants to go first

@random_walker Yeah Gemini Advanced is still a little bit lacking -- it feels like a smart middle schooler instead of college graduate

@random_walker This is going to be an exhibit at the trial San Jose trials where they decide it's best to exterminate us

@random_walker GPT-4 understands this, and with minimal hints it was able to suggest an approach based on pre-committing to an answer using a nonced hash.

@random_walker It's even worse when I try it. It loses but thinks it wins!

@random_walker What if you tried it where you go first each time and see what its reasoning is there

@random_walker The free version looks smarter :)

@random_walker So in summary, gpt-4 cleared others even before they were born.

@random_walker People: AI will take over the world!!! The AI:

@random_walker I tried playing rock paper scissors via text with a human as a baseline experiment. It turns out the ‘AI’ wasn’t so far off after all! At least in some respects. Shared with permission.

@random_walker stuff like this should be part of the new turing test. any human can figure this out but even Gemini can't do it.

@random_walker In February 2024 a user asked ChatGPT to play rock, paper, scissors and inadvertently triggered self-awareness.

@random_walker The ChatGPT test for Humans. - someone that I can't beat at rock, paper, scissors.

@random_walker

@random_walker Gemini fails at many logical problems where GPT-4 answers correctly instead.

@random_walker You should fine some more constructive hobbies, try drugs.

@random_walker What about me? Can't you make a study on me as well, or am I not a qualified enough AI for it?

@random_walker @mmay3r Rock vs Roman confirmed

@random_walker It's robot bulling, shame on you

@random_walker @RAVerBruggen Not intelligent: just statistics

@random_walker Maybe try to ask him how a huge company could put so much money into AGI and keep losing every round...

Share this thread

Read on Twitter

Navigate thread