Quentin Anthony's Twitter Thread

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

Firstly, I think AI speedup is very weakly correlated to anyone's ability as a dev. All the devs in this study are very good. I think it has more to do with falling into failure modes, both in the LLM's ability and the human's workflow. I work with a ton of amazing pretraining

I think cases of LLM-overuse can happen because it's easy to optimize for perceived enjoyment rather than time-to-solution while working. Me pressing tab in cursor for 5 hours instead of debugging for 1:

Second, LLMs today have super spiky capability distributions. I think this has more to do with: 1) what coding tasks we have lots of clean data for, and 2) what benchmarks/evals LLM labs are using to measure success. As an example, LLMs are all horrible at low-level systems

Along this point, there's a long tail of issues that cause an LLM to choke: - "Context rot", where models become distracted by long+irrelevant contexts (especially from long conversations). See https://x.com/simonw/status/19... You need to open a new chat often. This effect is worsened if

Third, it's super easy to get distracted in the downtime while LLMs are generating. The social media attention economy is brutal, and I think people spend 30 mins scrolling while "waiting" for their 30-second generation. All I can say on this one is that we should know our own

LLMs are a tool, and we need to start learning its pitfalls and have some self-awareness. A big reason people enjoy @karpathy's talks is because he's a highly introspective LLM user, which he arrived at a bit early due to his involvement in pretraining some of them. If we

Some final statements: - METR is a wonderful organization to work with, and they are strong scientists. I've loved both participating in this study and reading their results. - I am not some LLM guru trying to preach. Think of this as me publishing a personal diary entry, and

Share this thread

Read on Twitter

Navigate thread