pLSTM：可並行化的線性源轉移標記網絡

摘要

現代循環架構，如xLSTM和Mamba，最近在語言建模領域對Transformer發起了挑戰。然而，這些架構的結構限制了它們僅適用於序列數據，或要求以預定義的順序處理多維數據結構，如圖像或分子圖。相比之下，多維循環神經網絡（MDRNNs）更適合處理具有更高層次結構的數據，如二維網格、樹和有向無環圖（DAGs）。在本研究中，我們將多維性的概念擴展到線性循環神經網絡。我們引入了可並行化的線性源轉換標記網絡（pLSTMs），利用源門、轉換門和標記門作用於一般DAG的線圖上。這使得在DAG上實現並行化成為可能，類似於並行關聯掃描和序列線性循環神經網絡的分塊遞歸形式。對於規則網格（一維和二維），如圖像，該方案可以通過einsum操作、連接和填充在對數時間內高效實現。pLSTMs通過兩種不同的模式解決了DAG中長距離的激活/梯度消失/爆炸問題：定向傳播模式（P模式）和擴散分佈模式（D模式）。為了展示pLSTM的長距離能力，我們引入了箭頭指向外推作為一個包含長距離方向信息的合成計算機視覺任務。我們證明pLSTMs能夠很好地泛化到更大的圖像尺寸，而Transformer則難以外推。在已建立的分子圖和計算機視覺基準測試中，pLSTMs也表現出強大的性能。代碼和數據集可在以下網址獲取：https://github.com/ml-jku/plstm_experiments。

English

Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time. pLSTMs tackle the vanishing/exploding activation/gradient problem for long distances in DAGs via two distinct modes: a directed propagation mode (P-mode) and a diffusive distribution mode (D-mode). To showcase the long-range capabilities of pLSTM, we introduce arrow-pointing extrapolation as a synthetic computer vision task that contains long-distance directional information. We demonstrate that pLSTMs generalize well to larger image sizes, whereas Transformers struggle to extrapolate. On established molecular graph and computer vision benchmarks, pLSTMs also show strong performance. Code and Datasets are available at: https://github.com/ml-jku/plstm_experiments.

pLSTM：可並行化的線性源轉移標記網絡

pLSTM: parallelizable Linear Source Transition Mark networks

摘要

Support