ChatPaper.aiChatPaper

阿達瑪斯:基於哈達瑪稀疏注意力機制的高效長上下文推理

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

October 21, 2025
作者: Siyuan Yan, Guo-Qing Jiang, Yuchen Zhang, Xiaoxing Ma, Ran Zhu, Chun Cao, Jingwei Xu
cs.AI

摘要

大型語言模型(LLMs)現已支援數十萬至數百萬詞元的上下文窗口,使長文件摘要、大規模程式碼合成、多文件問答及持續性多輪對話等應用成為可能。然而,此類擴展上下文加劇了自注意力機制的二次方計算成本,導致自迴歸解碼出現嚴重延遲。現有的稀疏注意力方法雖能緩解這些成本,但依賴於啟發式模式,難以針對每個查詢召回關鍵的鍵值對,從而導致準確性下降。我們提出Adamas——一種專為長上下文推理設計的輕量級高精度稀疏注意力機制。該方法透過Hadamard變換、分桶處理及2位元壓縮技術生成緊湊表示,並利用曼哈頓距離估計實現高效的top-k選擇。實驗表明,Adamas僅需64個詞元的預算即可達到與全注意力相當的準確性,在128詞元時實現近乎無損的效能,且相比先前最先進方法最高可支援8倍稀疏度,同時在32K長度序列上實現4.4倍自注意力加速和1.5倍端到端加速。值得注意的是,Adamas甚至能達到與全注意力相當或更低的困惑度,彰顯其在激進稀疏條件下保持準確性的卓越效能。
English
Large language models (LLMs) now support context windows of hundreds of thousands to millions of tokens, enabling applications such as long-document summarization, large-scale code synthesis, multi-document question answering and persistent multi-turn dialogue. However, such extended contexts exacerbate the quadratic cost of self-attention, leading to severe latency in autoregressive decoding. Existing sparse attention methods alleviate these costs but rely on heuristic patterns that struggle to recall critical key-value (KV) pairs for each query, resulting in accuracy degradation. We introduce Adamas, a lightweight yet highly accurate sparse attention mechanism designed for long-context inference. Adamas applies the Hadamard transform, bucketization and 2-bit compression to produce compact representations, and leverages Manhattan-distance estimation for efficient top-k selections. Experiments show that Adamas matches the accuracy of full attention with only a 64-token budget, achieves near-lossless performance at 128, and supports up to 8x higher sparsity than prior state-of-the-art (SOTA) methods while delivering up to 4.4x self-attention and 1.5x end-to-end speedups on 32K-length sequences. Remarkably, Adamas attains comparable or even lower perplexity than full attention, underscoring its effectiveness in maintaining accuracy under aggressive sparsity.
PDF42December 2, 2025