pLSTM: 병렬화 가능한 선형 소스 전환 마크 네트워크

초록

최근 xLSTM 및 Mamba와 같은 현대적인 순환 아키텍처가 언어 모델링 분야에서 Transformer에 도전하고 있습니다. 그러나 이러한 아키텍처의 구조는 시퀀스에만 적용되도록 제한하거나 이미지나 분자 그래프와 같은 다차원 데이터 구조를 미리 정의된 순차적 순서로 처리해야 한다는 한계를 가지고 있습니다. 이와 대조적으로, 다차원 RNN(MDRNN)은 2D 그리드, 트리, 방향성 비순환 그래프(DAG)와 같은 더 높은 수준의 구조를 가진 데이터에 적합합니다. 본 연구에서는 이러한 다차원성 개념을 선형 RNN으로 확장합니다. 일반적인 DAG의 라인 그래프에 작용하는 소스(Source), 전이(Transition), 마크(Mark) 게이트를 사용하여 병렬화 가능한 선형 소스 전이 마크 네트워크(pLSTM)를 소개합니다. 이를 통해 DAG에 대해 병렬 연관 스캔 및 순차적 선형 RNN의 청크 단위 순환 형태와 유사한 병렬화가 가능해집니다. 이미지와 같은 규칙적인 그리드(1D 및 2D)의 경우, 이 방식은 einsum 연산, 연결(concatenation), 패딩을 사용하여 로그 시간 내에 효율적으로 구현될 수 있습니다. pLSTM은 DAG에서 장거리에서 발생하는 활성화/기울기 소실/폭주 문제를 두 가지 모드로 해결합니다: 방향성 전파 모드(P-mode)와 확산 분포 모드(D-mode). pLSTM의 장거리 능력을 입증하기 위해, 장거리 방향 정보를 포함하는 합성 컴퓨터 비전 작업으로 화살표 지시 외삽(arrow-pointing extrapolation)을 도입합니다. pLSTM이 더 큰 이미지 크기로 잘 일반화되는 반면, Transformer는 외삽에 어려움을 겪는 것을 보여줍니다. 확립된 분자 그래프 및 컴퓨터 비전 벤치마크에서도 pLSTM은 강력한 성능을 보입니다. 코드와 데이터셋은 https://github.com/ml-jku/plstm_experiments에서 확인할 수 있습니다.

English

Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time. pLSTMs tackle the vanishing/exploding activation/gradient problem for long distances in DAGs via two distinct modes: a directed propagation mode (P-mode) and a diffusive distribution mode (D-mode). To showcase the long-range capabilities of pLSTM, we introduce arrow-pointing extrapolation as a synthetic computer vision task that contains long-distance directional information. We demonstrate that pLSTMs generalize well to larger image sizes, whereas Transformers struggle to extrapolate. On established molecular graph and computer vision benchmarks, pLSTMs also show strong performance. Code and Datasets are available at: https://github.com/ml-jku/plstm_experiments.

pLSTM: 병렬화 가능한 선형 소스 전환 마크 네트워크

pLSTM: parallelizable Linear Source Transition Mark networks

초록

Support