pLSTM：可并行化的线性源转移标记网络

摘要

现代循环架构，如xLSTM和Mamba，近期在语言建模领域对Transformer发起了挑战。然而，这些架构的结构限制了它们仅适用于序列数据，或要求以预定义的顺序处理多维数据结构，如图像或分子图。相比之下，多维循环神经网络（MDRNNs）则更适应于具有更高层次结构的数据，如二维网格、树和有向无环图（DAGs）。在本研究中，我们将多维度的概念扩展至线性循环神经网络。我们引入了可并行化的线性源转换标记网络（pLSTMs），利用作用于一般DAG线图上的源门、转换门和标记门，实现了类似于并行关联扫描及序列线性RNN分块递归形式的并行化，但适用于DAGs。对于规则网格（一维和二维），如图像，该方案可通过einsum操作、拼接和填充在对数时间内高效实现。pLSTMs通过两种模式解决DAG中长距离激活/梯度消失/爆炸问题：定向传播模式（P模式）和扩散分布模式（D模式）。为展示pLSTM的长距离能力，我们引入了一个包含长距离方向信息的合成计算机视觉任务——箭头指向外推。我们证明，pLSTMs能很好地泛化到更大尺寸的图像，而Transformer则在外推上表现不佳。在已建立的分子图和计算机视觉基准测试中，pLSTMs同样展现了强劲的性能。代码与数据集可在以下网址获取：https://github.com/ml-jku/plstm_experiments。

English

Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time. pLSTMs tackle the vanishing/exploding activation/gradient problem for long distances in DAGs via two distinct modes: a directed propagation mode (P-mode) and a diffusive distribution mode (D-mode). To showcase the long-range capabilities of pLSTM, we introduce arrow-pointing extrapolation as a synthetic computer vision task that contains long-distance directional information. We demonstrate that pLSTMs generalize well to larger image sizes, whereas Transformers struggle to extrapolate. On established molecular graph and computer vision benchmarks, pLSTMs also show strong performance. Code and Datasets are available at: https://github.com/ml-jku/plstm_experiments.

pLSTM：可并行化的线性源转移标记网络

pLSTM: parallelizable Linear Source Transition Mark networks

摘要

Support