Tom Goldstein's Twitter Thread

How many GPUs does it take to run ChatGPT? And how expensive is it for OpenAI? Let’s find out! 🧵🤑

We don’t know the exact architecture of ChatGPT, but OpenAI has said that it is fine-tuned from a variant of GPT-3.5, so it probably has 175B parameters. That's pretty big.

How fast could it run? A 3-billion parameter model can generate a token in about 6ms on an A100 GPU (using half precision+tensorRT+activation caching). If we scale that up to the size of ChatGPT, it should take 350ms secs for an A100 GPU to print out a single word.

Of course, you could never fit ChatGPT on a single GPU. You would need 5 80Gb A100 GPUs just to load the model and text. ChatGPT cranks out about 15-20 words per second. If it uses A100s, that could be done on an 8-GPU server (a likely choice on Azure cloud).

So what would this cost to host? On Azure cloud, each A100 card costs about $3 an hour. That's $0.0003 per word generated. But it generates a lot of words! The model usually responds to my queries with ~30 words, which adds up to about 1 cent per query.

ChatGPT acquired 1M users within its first 5 days of operation. If an average user has made 10 queries per day, I think it’s reasonable to estimate that ChatGPT serves ~10M queries per day. https://x.com/sama/status/1599...

I estimate the cost of running ChatGPT is $100K per day, or $3M per month. This is a back-of-the-envelope calculation. I assume nodes are always in use with a batch size of 1. In reality they probably batch during high volume, but have GPUs sitting fallow during low volume.

The real costs for a typical organization would almost certainly be higher than this because parallelization is not 100% efficient, GPUs are not 100% utilized, and my runtime estimate is optimistic.

The cost to OpenAI may be lower though, because of its partnership with Microsoft. Either way, that ain't cheap. Some say it's wasteful to pour these kinds of resources (and carbon) into a demo. But hey, it's not the worst use of Elon's money that we've seen of late 💸💸💸

Thanks to NLP gurus @jwkirchenbauer and @jonasgeiping for their inputs on this thread.

@tomgoldsteincs @tomgoldsteincs it takes only ~45 msecs/token for 175B params This is for BLOOM 176B for more info, read the blog: https://huggingface.co/blog/bl... Also, I guess they would be running on int8 with fast fused kernels so, its probably even faster

@Asuna_FPS_ Great blog post! Note that I said 350ms for a *single* A100 (if it had the memory). My estimate was a throughput of 20 words/sec on the whole node, which is 50ms per inference - very close to the 45ms you quote.

@tomgoldsteincs No doubt it’s costly but this isn’t a simple marginal cost exercise. The publicity and more importantly, the human in loop training (or rlhf) is valuable.

@pmehra23 Yep - OpenAI used to be paying for full time human annotators. Now they are getting the human time for free. This may actually be a cost saving measure for them in some sense.

@tomgoldsteincs What about the environmental cost? Curious how the new model compares to the original calculations of @strubell of 5 cars per one NLP model training

@OlyaKudina @strubell How much power does ChatGPT use? It takes at most .6 watt-hours to process a query. If my 10M queries/day estimate is correct (usage has almost certainly gone higher by now) then ChatGPT uses ~6000 kilowatt-hours per day; enough to charge 100 Teslas from empty to full.

@tomgoldsteincs i think your per query estimates are on the high end and your total volume estimate is on the lower end. i would expect they are shipping a 30B param mode or lesser, and doing some amount of caching, and they are getting 80 to 90% HFU.

@tomgoldsteincs just use $RNDR ;)

@tomgoldsteincs @memdotai mem it

@tomgoldsteincs depends on the context window but about 320BG with batch size 2 (context window at 512 tokens) that’s 16 x A100s (40gb size)

@tomgoldsteincs @karl_0x how we even meant to compete

@tomgoldsteincs I would assume they run this almost for “free” with their connection to Microsoft/Azure… (No doubt they do some accounting wizardry to make it “appear” that they pay it for tax benefits though)

@tomgoldsteincs They tried to answer it in a vague way here: https://x.com/sama/status/1599...

@tomgoldsteincs men it

@tomgoldsteincs Dang, that would be quite expensive. But I'd love to hear some numbers down the line from Sam. Mad impressive what they have shipped so far.

@tomgoldsteincs @KasperGroes Hi Tom, I’m writing about this - could I get a DM please? Thanks

@tomgoldsteincs I guess we are left with some estimates for now...

@tomgoldsteincs An OpenAI employee recently tweeted a tongue in cheek request for anyone with a few thousand A100's they could spare. At least there isn't as much crypto mining these days.

@tomgoldsteincs Likely cheaper per token. OpenAI GPT3 already went 1/3 in price a few weeks ago and still likely had decent profit margin built in. ChatGPT was an attempt to utilize underused servers.

@tomgoldsteincs Most answered queries can be cached for sure...

@tomgoldsteincs Why would you run it on a GPU? The whole point is to train on a GPU or TPU but run on a CPU. What am I missing?

@tomgoldsteincs @akashnet_ fixes this ☁️ $AKT

@tomgoldsteincs Also ... why assume it runs on GPUsand not TPUs or any other architecture-specific tensor processors? They are way more efficient for inference than a general purpose GPU

@tomgoldsteincs It is quicker to ask him directly

@tomgoldsteincs Text-Davinci-003 tokens are sold for 2 cents per 1000, so about 1.5 cents per 500 words. So unless chatGPT is performing significantly more computation than the standard GPT-3.5 model, the cost to OpenAI should be less than that.

Share this thread

Read on Twitter

Navigate thread