Thariq's Twitter Thread

Why even non-coding agents need bash I've done dozens of calls with companies making general agents over the past few weeks and my advice generally boils down to: "use the bash tool more" Here's a concrete example from my email agent:

The user asks: "How much did I spend on ride sharing this week?" With tool calls, you have to fetch emails and then have the model figure it out from there. You might have fetched ~100 emails and it will be hard for the model to find this data.

With the bash tool, you can save these results to files and then search. This lets the model: - ground its results in reproducible code - take multiple steps at finding everything - double check its work and verify it

Other examples: Chaining API Calls: Very often you want to compose a series of API calls, for example "get all the contacts I've sent emails to this week" which would involve fetching all your emails, deduping contacts and then doing an individual API request per contact.

Video & File Editing: The models are great at using tools like ffmpeg to process videos, searching their captions to find particular time slices.

Reoccurring Tasks: Within a container for your agent you could use cronjobs or the "at" command to dynamically create reoccurring jobs as the user requests it.

The Bash tool is one of the most powerful general purpose tools you can give an agent, but you also want to add in guardrails to make it safe. With the Claude Agent SDK we built a bash parser and permission system to make this easier: https://docs.claude.com/en/api...

@trq212 We've been benchmarking agent's abilities to use code vs using tools and it doesn't seem like there's an obvious difference in performance. "Code Mode" is a very compelling idea but I'd like more quantitative evidence.

@kylediaz_com Curious what your benchmarks look like. Generally we've found the bash tool/codegen lets you handle a longer tail of use cases and do things emergently like lazy loading context via skills.

@trq212 I’m confused, where does this actually run? How do you ensure the agent has access to a filesystem, do you spin up a container for every user request?

@citnotcitta You can read more on hosting here: https://x.com/trq212/status/19...

@trq212 i LOVE claude code and anthropic but i'm unable to understand the need for Claude agent SDK. Sure it could work if you give your agent a Bash tool but it feels like a solution in search of a problem. Because if you want to do this, you need to change the way you deploy your app

@nityeshaga We wouldn't do this if it didn't work, because like you said it is more expensive and complicated. I was trying to show some capability differences with these examples, what didn't click? You can read more about hosting here: https://x.com/trq212/status/19...

@trq212 conceptually beautiful difficult in practice and when on scaled up application on server etc etc

@jurajsalapa difficult in practice is what makes it cutting edge :)

@trq212 we're building AGI that can't run grep companies raising 100M+ for "agentic AI" that hallucinates counting emails while some perl script from 2003 does it in 0.3 seconds

@trq212 Is there something bash specific here? I interpret the generalized version of this to be a code execution env with standard tools (e.g. either ffmpeg for bash or similar lib for python). Ultimately you "just" need to give the agent with some sort of code execution sandbox with

@trq212 Bash as universal glue is underappreciated. We're wrapping it in MCP for agent payments - turns out chmod and curl compose better than most SDKs. What specific bash patterns are you seeing work best?

@trq212 This idea reminded me of Code Mode by cloudflare https://blog.cloudflare.com/co...

@trq212 Have you found any specific lang claude code is best suited to use as a tool? I experimented with bash vs python in particular and the latter seems to be much more reliable

@trq212 does it have to be bash though? claude often gets trapped in multiple layers of bash escaping hell when trying to do more complex stuff. had to bring in grok for this one

@trq212 Bash is like the Swiss Army knife for scripts. I remember automating a boring file cleanup task once; saved me hours and my sanity.

@trq212 then this os only intended to run it containers right?

@trq212 This nails it. Agents with bash access can do things, not just talk about them. Huge for reliability.

@trq212 Unlock achieved 🧠

@trq212 lol this reminded me of you and regex with these models @pvncher

@trq212 Absolutely not 😅

@trq212 @_catwu This is gold. Thanks for sharing!

@trq212 but of course, every enterprise will deploy bash on their *windows* VDIs, so Claude can use it to write python scripts to fetch gmail. just stick to writing 90% of the code

Share this thread

Read on Twitter

Navigate thread