Rohan Paul's Twitter Thread

🏗️ Hardware Memory bandwidth is becoming the choke point slowing down GenAI. During 2018–2022, transformer model size grew ~410× every 2 years, while memory per accelerator grew only about 2× every 2 years. And that mismatch shoves us into a “Memory-Wall” The "memory wall" is

🧵2/n. AI and Memory Wall The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is

🧵3/n. We can see the huge growth in HBM (High Bandwidth Memory) bit demand that has come alongside AI accelerator demand. This is a direct manifestation of that "Memory Wall" As models grow, we need more bits of HBM just to store weights, activations, and KV caches. That is

@rohanpaul_ai This is the exact problem we’re solving with low-cost (NVMe) full memory bandwidth (HBM+DRAM) augmented memory from @WekaIO. See real-world agentic benchmarks on CoreWeave

@rohanpaul_ai 🔥 The real GenAI bottleneck isn’t compute—it’s memory bandwidth. Over the past few years, model sizes exploded 400× while memory bandwidth barely doubled. GPUs are faster than ever, yet most of that power sits idle waiting for data. That’s the Memory Wall. 💡 The next big leap

@rohanpaul_ai genai devs realizing memory wall isn't just a diagram, it's a lifestyle

@rohanpaul_ai In-memory compute is the clear next step. Not only can it perform some operations independently, but could use compression to dramatically boost bandwidth.

@rohanpaul_ai this is why sparse models and in-memory compute architectures are gaining steam, brute forcing through the memory wall just isn’t sustainable long term

@rohanpaul_ai This is why Micron is still so cheap, even though the stock keeps running up. The stock price can’t keep up with the memory demand, and earnings growth. Plus margins are exploding because memory prices are going up every day. That’s why Micron didn’t lock in memory prices for

@rohanpaul_ai Would HBF able to help, its a year away.

@rohanpaul_ai Agreed

@rohanpaul_ai It’s always the DRAM as the choke point for any HPC. Teams spend weeks debugging cause memory throughput is never a tracked metric nor is it published by SaaS providers (unlike “disk” and network throughput) only to find the same root cause every time 🙃

@rohanpaul_ai GPT4 is not a 10T model, maybe 1T or 2T but not more

@rohanpaul_ai The text only version of GPT-4 was only 1.8T params, not 10T lol. The vision component of GPT-4 would have added a few more parameters but I don't think it would have massively changed the size, so that estimate is 5x larger than it should be.

@rohanpaul_ai hopefully we can reduce the need to even have higher bandwidth through more efficient models or entire redesigns.

@rohanpaul_ai It's not just the Memory wall we are already approaching LLM wall as we plateau with LLM advancements #LLMs

@rohanpaul_ai 🙏

@rohanpaul_ai Rohan指出内存带宽成了GenAI的瓶颈。依据Roofline模型，算力↑而内存↑不足导致算子受限，需向近存计算、HBM3+和层级缓存重构方向探索。你认为哪种架构最能突破"Memory‑Wall"？

@rohanpaul_ai Memory wall is skill issue

@rohanpaul_ai @grok does NVIDIA Blackwell have more memory per accelerator? How does xAI address this? Is this a real problem or have engineers already solved? The paper seems old since it references H100s.

Share this thread

Read on Twitter

Navigate thread