ChatPaper.aiChatPaper

HGRN2:具有状态扩展的门控线性循环神经网络

HGRN2: Gated Linear RNNs with State Expansion

April 11, 2024
作者: Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
cs.AI

摘要

分层门控线性循环神经网络(HGRN,Qin等,2023)在语言建模中表现出竞争力的训练速度和性能,同时提供高效的推断。然而,HGRN的循环状态大小仍然相对较小,这限制了其表现力。为了解决这个问题,受线性注意力的启发,我们引入了一种基于外积的简单状态扩展机制,从而可以显著扩大循环状态的大小,而不引入任何额外的参数。线性注意力形式还可以实现硬件高效的训练。我们的大量实验证实了HGRN2在语言建模、图像分类和长距离竞技场中优于HGRN1的优势。在受控实验环境中,我们最大的30亿参数的HGRN2模型在语言建模方面略优于Mamba和LLaMa架构变压器;在使用更少的总训练标记的情况下,在下游评估中与许多开源的30亿参数模型竞争力相当。
English
Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so that the recurrent state size can be significantly enlarged without introducing any additional parameters. The linear attention form also allows for hardware-efficient training.Our extensive experiments verify the advantage of HGRN2 over HGRN1 in language modeling, image classification, and Long Range Arena.Our largest 3B HGRN2 model slightly outperforms Mamba and LLaMa Architecture Transformer for language modeling in a controlled experiment setting; and performs competitively with many open-source 3B models in downstream evaluation while using much fewer total training tokens.

Summary

AI-Generated Summary

PDF211December 15, 2024