ChatPaper.aiChatPaper

HGRN2:帶有狀態擴展的閘控線性循環神經網絡

HGRN2: Gated Linear RNNs with State Expansion

April 11, 2024
作者: Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
cs.AI

摘要

階層閘控線性循環神經網絡(HGRN,Qin等,2023年)在語言建模中展示了競爭力的訓練速度和性能,同時提供了高效的推論。然而,HGRN的循環狀態大小仍然相對較小,這限制了其表達能力。為了解決這個問題,受線性注意力的啟發,我們引入了一種基於外積的狀態擴展機制,使得循環狀態的大小可以顯著擴大,而無需引入任何額外的參數。線性注意力形式還可以實現硬件高效訓練。我們的廣泛實驗驗證了HGRN2在語言建模、圖像分類和長距離競技場中的優勢。我們最大的3B HGRN2模型在受控實驗環境中略優於Mamba和LLaMa架構變壓器進行語言建模;在使用更少的總訓練標記的情況下,在下游評估中與許多開源的3B模型競爭。
English
Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so that the recurrent state size can be significantly enlarged without introducing any additional parameters. The linear attention form also allows for hardware-efficient training.Our extensive experiments verify the advantage of HGRN2 over HGRN1 in language modeling, image classification, and Long Range Arena.Our largest 3B HGRN2 model slightly outperforms Mamba and LLaMa Architecture Transformer for language modeling in a controlled experiment setting; and performs competitively with many open-source 3B models in downstream evaluation while using much fewer total training tokens.

Summary

AI-Generated Summary

PDF211December 15, 2024