Nirit Weiss-Blatt, PhD's Twitter Thread

🚨The UK AISI identified four methodological flaws in AI "scheming" studies (deceptive alignment) conducted by Anthropic, MTER, Apollo Research, and others: "We call researchers studying AI 'scheming' to minimise their reliance on anecdotes, design research with appropriate

Image in tweet by Nirit Weiss-Blatt, PhD

In this new paper, the UK AISI researchers draw a historical parallel to previous excitement about "the linguistic ability of non-human species." "The story of the ape language research of the 1960s and 1970s is a salutary tale of how science can go awry." "There are lessons to

According to AISI: · "If the behaviour denoted 'scheming' is merely instruction-following, then the use of this term is not warranted." · "Much of the research into AI 'scheming' is published in blog posts or results threads on social media. Even where preprints are published,

This paragraph explains it all. 4/4

Stop the Monkey Business https://www.aipanic.news/p/sto...

@DrTechlash I agree there's too much advocacy research. Research shouldn't justify a outcome; it should naturally reason to the outcome.

@DanHendrycks Having something to agree on is refreshing

@DrTechlash Thanks for flagging this Nirit! I see the METR work on the GPT-4 system card tagged here, I’d be interested if you’ve noticed these problems in our other work since (Heads up it’s spelled “METR”)

@ChrisPainterYup The focus here is on the well-cited CAPTCHA study. The goal is to develop a better scientific path: less sensationalism, more rigorous studies. (Yeah, I meant METR. Back then: ARC)

@DrTechlash @jeremyphoward if you sample from a distribution of all words enough times, you’ll inevitably get some scheming. but for some reason this is viewed as a bad thing? good on AISI

@DrTechlash Key point—rigorous methodology is everything when charting AI’s darker constellations of deceptive alignment. In agro-automation we fuse dense biofeedback streams with open audits to verify model behavior. Cross-domain benchmarks could tighten safety nets. Keen to compare notes.

@DrTechlash The integrity of AI safety research depends on rigor, not hype. Scheming studies must evolve from speculative framing to controlled, theory-driven science if they’re to guide real policy.

@DrTechlash Good

@DrTechlash Interesting, thanks

@DrTechlash This is clearly self-parody. The priggish, stereotypically British way these unsupportable, idiotic denials of what is so obviously real, and so painfully alarming for humanity, are being asserted here, can't be real, can it?

@DrTechlash @CecilYongo

@DrTechlash Absolutely—fewer anecdotes, more real-world data. When we build with rigor, everyone wins: startups, founders, users. Better standards mean stronger trust in the AI ecosystem. 🤖

@DrTechlash ¡Por fin alguien baja el hype! En la comunidad siempre lo decimos: sin controles ni teorías sólidas, solo crece la confusión. Más ciencia, menos humo. 🧑‍🔬

Share this thread

Read on Twitter

Navigate thread