Nathan Lambert's Twitter Thread

Who's using GPT-OSS and for what? Was it cheaper, better, faster than other open models? Or just not from China? Download numbers are actually very strong on HuggingFace for first model releases.

@natolambert Get oss 120b for quick coding questions. I use it in Cline in vs code. It is pretty good. It is powerful enough to drive a coding agent like Cline

@natolambert Using GPT-OSS for eval pipelines and overnight doc tagging. Cheaper on long batches, predictable quotas. Not faster than Qwen2.5-Instruct locally; main win’s control, latency SLOs, and spend caps.

@natolambert Mxfp4 Not from China Fast "Ok" at most things Easy to uncensor It's pretty useful

@natolambert Using it for home lab processing personal documents. Better and faster responses than Qwen 2.5:32B

@natolambert It is the best open model at tool calling. With a bit of fine-tune, it can be a great coordinating agent!

@natolambert Wonder if it is partly b/c this playground makes it easy to download and run: https://gpt-oss.com

@natolambert 1. Cannot run on consumer hardware without some kind of work around 2. Is not multimodal 3. Is "ok" for most tasks Gemma 3 better

@natolambert A few reasons: - 120B hallucinates less in my experience compared to Qwen3-32B. - It is the best model currently available on Groq for latency sensitive use-cases. Qwen3 235B is not yet available on Groq.

@natolambert It's an amazing reasoning/tool use model. If you're not using it for anything creative I would say it is an o4-mini level model. We're using it for data generation/curation with great success. Hope OpenAI continue releasing/updating it. P.S. Use it with Responses API only!

@natolambert I use it on cerebras for latency sensitive tasks and ok-intelligence.

@natolambert I've found the 120B variant to be pretty strong for reasoning, tool calling and basic agentic coding tasks when paired with the Codex CLI tool.

@natolambert I’m am using it as LLM-as-judge. The 20B one is a nice, convenient size (about 14 GB RAM), and cheap and fast thanks to the MXFP4 optimization.

@natolambert I find it incredibly good at making very plausible synthetic data. I’ve used it a ton, was 60% of the inference on OpenRouter for a week in August

@natolambert I’m using it as LLM-as-a-judge. Very very fast on my laptop and much better then comparably quick qwen for me

@natolambert the only thing I'm thinking about is whether the # of active params in even the 120b model is enough to get solid results during post training

@natolambert I am using it to build an equity research copilot. The reason I like it is because it’s available at great speeds @GroqInc so helps with latency. Also the performance is great for my task.

@natolambert It is the only “non Chin” thinking model which is OSS. It is VERY strong on general thinking. I couldn’t FT it effectively, I am pretty sure skill issue and I am looking forward for good walkthroughs.

@natolambert We are using 20B for schema extraction.

@natolambert I think the mxfp4 quantization by default made it very accessible

@natolambert @TheZachMueller I use it for agentic search and I’m actually happy with it. It’s fast and knows tools

@natolambert Great for collecting multi-turn synthetic data such as complex QA search.

@natolambert finetunes really well and stays quite smart https://huggingface.co/Tonic/m...

@natolambert I like pre-training done from OpenAI. Qwen, Deepseek are good on benchmarks, but OA's models are most practical in real-world scenarios and generalise better. Their pre-training covers more diverse data, and models are refined. On the contrary, both Q,D have been historically bad

@natolambert I imagine it will be quite strong at universities. Where I’m at we are not allowed to use Chinese models, no matter where they are hosted, etc.

@natolambert On Cerebras it is super fast and I don't get throttled. If I try to use Qwen or DeepSeek, it gets throttled right away. gpt-oss is so cheap and fast I can make multiple calls in parallel to check and retry as needed. And still get an answer back to the user in < 10 seconds.

@natolambert Qwen is just better in every measurable way

@natolambert I tried the 20b one. It was thinking for about an actual hour and then returned meh results. Kinda interesting. Heard of others having that issue as well

@natolambert Our free tier chat on grow. tokens goes brrrr.

@natolambert Definitely better and faster in my experience, I'm waiting on @lmstudio to add MCPs to headless mode to be able to use it daily. The smaller model was the first that seemed able to reliably do some tasks locally for me, particularly with MCPs.

@natolambert I liked GPT-OSS 20 way more than Qwen 3 8B, think it was faster too. But very concerned when you say anything remotely close to being sad. GPT-OSS 120 runs nicely on my MacBook and has good knowledge

@natolambert Is the fastest model on aws bedrock at that size by a MILE (120b), the only thing that has the same speed is something like nova micro, we are currently using it in a couple of products since we are measuring more or less 320 tks

@natolambert Data formatting and extraction. Really fast and cheap

@natolambert I tried fine tuning the 20b and it wasn't very good I was doing it to run on my local mac and 120b is too big for it Any idea if there would be something in between like 30-40b in near future ?

@natolambert I use it because it’s the best Arabic llm in my opinion

@natolambert Using it for redacting sensitive info prior to sending that up to LLMs.

@natolambert I'm using it via Groq for the llm in my conversational voice stack.

Share this thread

Read on Twitter

Navigate thread