FLARE:快速低秩注意力路由引擎
FLARE: Fast Low-rank Attention Routing Engine
August 18, 2025
作者: Vedant Puri, Aditya Joglekar, Kevin Ferguson, Yu-hsuan Chen, Yongjie Jessica Zhang, Levent Burak Kara
cs.AI
摘要
自注意力机制的二次复杂度限制了其在大规模非结构化网格上的适用性和可扩展性。我们引入了快速低秩注意力路由引擎(FLARE),这是一种线性复杂度的自注意力机制,通过固定长度的潜在序列来路由注意力。每个注意力头通过可学习的查询令牌将输入序列投影到长度为M(M远小于N)的固定潜在序列上,从而实现N个令牌之间的全局通信。通过将注意力路由至瓶颈序列,FLARE学习到了一种低秩形式的注意力,其计算成本为O(NM)。FLARE不仅能够扩展到前所未有的问题规模,而且在多种基准测试中,相较于最先进的神经PDE替代模型,提供了更优的精度。我们还发布了一个新的增材制造数据集,以促进进一步的研究。我们的代码可在https://github.com/vpuri3/FLARE.py获取。
English
The quadratic complexity of self-attention limits its applicability and
scalability on large unstructured meshes. We introduce Fast Low-rank Attention
Routing Engine (FLARE), a linear complexity self-attention mechanism that
routes attention through fixed-length latent sequences. Each attention head
performs global communication among N tokens by projecting the input sequence
onto a fixed length latent sequence of M ll N tokens using learnable query
tokens. By routing attention through a bottleneck sequence, FLARE learns a
low-rank form of attention that can be applied at O(NM) cost. FLARE not only
scales to unprecedented problem sizes, but also delivers superior accuracy
compared to state-of-the-art neural PDE surrogates across diverse benchmarks.
We also release a new additive manufacturing dataset to spur further research.
Our code is available at https://github.com/vpuri3/FLARE.py.