pLSTM:可并行化的线性源转移标记网络
pLSTM: parallelizable Linear Source Transition Mark networks
June 13, 2025
作者: Korbinian Pöppel, Richard Freinschlag, Thomas Schmied, Wei Lin, Sepp Hochreiter
cs.AI
摘要
现代循环架构,如xLSTM和Mamba,近期在语言建模领域对Transformer发起了挑战。然而,这些架构的结构限制了它们仅适用于序列数据,或要求以预定义的顺序处理多维数据结构,如图像或分子图。相比之下,多维循环神经网络(MDRNNs)则更适应于具有更高层次结构的数据,如二维网格、树和有向无环图(DAGs)。在本研究中,我们将多维度的概念扩展至线性循环神经网络。我们引入了可并行化的线性源转换标记网络(pLSTMs),利用作用于一般DAG线图上的源门、转换门和标记门,实现了类似于并行关联扫描及序列线性RNN分块递归形式的并行化,但适用于DAGs。对于规则网格(一维和二维),如图像,该方案可通过einsum操作、拼接和填充在对数时间内高效实现。pLSTMs通过两种模式解决DAG中长距离激活/梯度消失/爆炸问题:定向传播模式(P模式)和扩散分布模式(D模式)。为展示pLSTM的长距离能力,我们引入了一个包含长距离方向信息的合成计算机视觉任务——箭头指向外推。我们证明,pLSTMs能很好地泛化到更大尺寸的图像,而Transformer则在外推上表现不佳。在已建立的分子图和计算机视觉基准测试中,pLSTMs同样展现了强劲的性能。代码与数据集可在以下网址获取:https://github.com/ml-jku/plstm_experiments。
English
Modern recurrent architectures, such as xLSTM and Mamba, have recently
challenged the Transformer in language modeling. However, their structure
constrains their applicability to sequences only or requires processing
multi-dimensional data structures, such as images or molecular graphs, in a
pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are
well suited for data with a higher level structure, like 2D grids, trees, and
directed acyclic graphs (DAGs). In this work, we extend the notion of
multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source
Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that
act on the line graph of a general DAG. This enables parallelization in analogy
to parallel associative scans and the chunkwise-recurrent form of sequential
linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this
scheme can be efficiently implemented using einsum operations, concatenations,
and padding in logarithmic time. pLSTMs tackle the vanishing/exploding
activation/gradient problem for long distances in DAGs via two distinct modes:
a directed propagation mode (P-mode) and a diffusive distribution mode
(D-mode). To showcase the long-range capabilities of pLSTM, we introduce
arrow-pointing extrapolation as a synthetic computer vision task that contains
long-distance directional information. We demonstrate that pLSTMs generalize
well to larger image sizes, whereas Transformers struggle to extrapolate. On
established molecular graph and computer vision benchmarks, pLSTMs also show
strong performance. Code and Datasets are available at:
https://github.com/ml-jku/plstm_experiments.