Published: June 26, 2025
3
131
885

Attention! A new and improved attention mechanism has just been proposed by MIT, NVIDIA, Princeton, and others. Radial Attention is a sparse, static attention mechanism with O(n log n) complexity. It focuses on nearby tokens and shrinks the attention window over time. It can also work with LoRA-based fine-tuning on existing models. Results: ⚡ Up to 1.9× faster inference ✅ 4× longer videos 💰 4.4× cheaper training They also identify a phenomenon termed Spatiotemporal Energy Decay in video diffusion models: post-softmax attention scores diminish as spatial and temporal distance between tokens increase, akin to the physical decay of signal or waves over space and time in nature.

Image in tweet by 机器之心 JIQIZHIXIN

Radial Attention: O(n log n) Sparse Attention with Energy Decay for Long Video Generation Paper: https://arxiv.org/pdf/2506.198... Code: https://github.com/mit-han-lab...

Image in tweet by 机器之心 JIQIZHIXIN

Share this thread

Read on Twitter

View original thread

Navigate thread

1/2