Ivan Fioravanti ᯅ's Twitter Thread

DeepSeek-R1-0528-5bit on MLX pushing M3 Ultra 512GB to its limits! 501GB used mem visibile on mactop in the video! Context: 4K tokens Prompt: 190.29 t/s Gen: 11.37 t/s Peak Mem: 487.48 GB! THIS IS APPLE MLX!

Here you can find the quantized (5.5 bits) model on Huggingface: https://huggingface.co/mlx-com...

In the video I left last part normal speed to give real idea. Thanks Apple MLX Team for being able to do something like this! 🙏

8K context achieved! 💪 Prompt: 8145 tokens, 167.260 tokens-per-sec Generation: 1101 tokens, 10.158 tokens-per-sec Peak memory: 496.840 GB

16K!!! MLX no limit! 💪💪 Prompt: 15777 tokens, 131.764 tokens-per-sec Generation: 1265 tokens, 7.329 tokens-per-sec Peak memory: 510.726 GB

Here the video of a second try with 16K. DeepSeek-R1-0528-5bit running on MacOS 26 Beta 2 with mlx-lm 0.25.3 🚀

Share this thread

Read on Twitter

Navigate thread