Published: June 20, 2024
154
482
6.1k

Claude is starting to get really good at coding and autonomously fixing pull requests. It's becoming clear that in a year's time, a large percentage of code will be written by LLMs. Let me show you what I mean:

To start, if you want to see Claude 3.5 Sonnet in action solving a simple pull request, here's a quick demo video we made. (voiceover by the one and only @sumbhavsethia)

In our internal pull request eval, Claude 3.5 Sonnet passed 64% of our test cases. To put this in comparison, Claude 3 Opus only passed 38%.

Image in tweet by Alex Albert

3.5 Sonnet performed so well that it almost felt like it was playing with us on some of the test cases. It would find the bug, fix it, and spend the rest of its output tokens going back and updating the repo documentation and code comments.

Side note: With Claude's coding skills plus Artifacts, I've already stopped using most simple chart, diagram, and visualization software. I made the chart above in just 2 messages.

Image in tweet by Alex Albert

Back to PRs, Claude 3.5 Sonnet is the first model I've seen change the timelines of some of the best engineers I know. This is a real quote from one of our engineers after Claude 3.5 Sonnet fixed a bug in an open source library they were using.

Image in tweet by Alex Albert

At Anthropic, everyone from non-technical people with no coding experience to tenured SWEs now use Claude to write code that saves them hours of time. Claude makes you feel like you have superpowers, suddenly no problem is too ambitious. The future of programming is here folks.

@alexalbert__ Artifacts are going to be a game changer for non-technical LLM users who want to build. Until now, being able to actually run the code to view the output has been a barrier for people with no code experience.

Image in tweet by Alex Albert
Image in tweet by Alex Albert

@alexalbert__ It just made a simple react jsx contact form and managed to run it in the Artifacts playground 🤯

Image in tweet by Alex Albert

@alexalbert__ Claude 3.5 is great at improving code. See the first iteration of a tic-tac-toe game and the second, which adds score tracking and improves the UI design as requested. Iteratively improving output is important with LLMs and Claude 3.5 does it well.

Image in tweet by Alex Albert
Image in tweet by Alex Albert

@alexalbert__ A recent GitHub survey already showed that more than 80% of coders use AI to generate their code.

@alexalbert__ How does it do on SWEBench lite?

@alexalbert__ Can anyone set up the Claude 3.5 api to open files and execute code? Do you just need to give it a prompt that teaches it how to say specific words to do stuff that the environment then picks up as special actions?

@alexalbert__ Do you have a repo for the agentic workflow you can share?

@alexalbert__ Our thoughts exactly — AI will make us all 10x developers. #Trend-2-AI-turning-all-of-us-into-10X-developers class="text-blue-500 hover:underline" target="_blank" rel="noopener noreferrer">https://www.bvp.com/atlas/stat... h/t @kentbennett, @LindseyLi_ , @bhavikvnagda

@alexalbert__ I think PRs are the wrong paradigm for LLM generated software

@alexalbert__ Fantastic Finally, a capable model on AWS that can serve AI-empowered dev tools The agentic flow in the demo is neat. We need foundation models capable of working well for these simple cases for more sophisticated solutions (like @CodiumAI) to work on real-world complex cases

Image in tweet by Alex Albert

@alexalbert__ claude is my way to go for coding tasks, sometimes playing w/ it on http://microlaunch.net after the sonnet release, planning to migrate from gpt-4o to claude now

@alexalbert__ Plz help me write an XML ETL 🙏🙏

@alexalbert__ it should be 70-80% today tbh

@alexalbert__ @n_s_bradford how have you seen Claude perform? do you use a mixture?

@alexalbert__ Towards which direction are we progressing? Improving Developer Productivity or Replacing Developers 🤨

@alexalbert__ Is there any easy way to integrate this in Zed or VSCode? One of our products are a few thousands, and could be fun to generate a test suite on!

@alexalbert__ We have Opus already working in exception-retry loops with an online learning system, and inside a goal-directed counter agent btw

@alexalbert__ @IamRamenPanda Coding? Im more of an ideas guy.

@alexalbert__ I’m very excited for this. 3 Opus has been my go-to for coding. I’ve found it to be better than GPT4 and definitely better than 4o. If 3.5 Sonnet is as good as you say…omgggg 🚀

@alexalbert__ On the other hand, I see Claude's creative abilities depleting a little bit over the past few months. Is it just me? Mostly, it takes multiple prompts and iterations to get the desired results in comparison to fewer prompts earlier.

@alexalbert__ Thanks for posting this. Is there any documentation to configure vscode as described in the video? I will check out the anthropic site, but just curious if anyone has a shortcut? If this works, looking forward to cancelling my github copilot!

@alexalbert__ I enjoy using Claude to help me code but it hallucinates a LOT. Only slightly less than Llama 2. That's why I mostly prefer perplexity where I can verify quickly by reading the sources.

@alexalbert__ It's fascinating to see how quickly LLMs like Claude are evolving in coding abilities. While automation is gaining ground, I think human creativity and problem-solving will remain crucial.

@alexalbert__ @gustavo_pch Blocked by region 🤬

@alexalbert__ 6 more months guys, just 6 more months!

@alexalbert__ They are mostly trash at complex stuff, but really good at the boring and pedantic shovelling tasks.

@alexalbert__ Thats amazing! The last missing link is adding web search. The only thing which stop me from switching for my personal assistant from ChatGPT.

@alexalbert__ Thanks for guiding

@alexalbert__ When is 3.5 Sonnet available for subscribers?

@alexalbert__ @alexalbert__ I'd love to be able to connect Claude to my codebase via IntelliJ -- have it read through my code, optimize, generate new code, etc. Imagine having it build Django modules, or make all your CSS responsive... or build, deploy and evaluate computer vision/AI models...

Share this thread

Read on Twitter

View original thread

Navigate thread

1/37