ChatPaper.aiChatPaper

RWKV-7“鹅”模型:具备表现力动态状态演化

RWKV-7 "Goose" with Expressive Dynamic State Evolution

March 18, 2025
作者: Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Haowen Hou, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng
cs.AI

摘要

我们推出了RWKV-7“Goose”,一种全新的序列建模架构,并伴随预训练语言模型,这些模型在多语言任务上以30亿参数规模确立了新的下游性能标杆,同时在英语语言性能上媲美当前的最先进水平,尽管其训练所用的token数量远少于其他顶尖的30亿参数模型。值得注意的是,RWKV-7模型仅需恒定的内存使用和每token恒定的推理时间。RWKV-7引入了向量化门控和上下文学习率的新泛化delta规则,以及一种宽松的值替换规则。我们展示出RWKV-7能够执行状态追踪并识别所有正则语言,同时保持训练的可并行性。这一能力超越了在标准复杂度猜想下仅限于TC^0的Transformer模型。为了展示RWKV-7的语言建模能力,我们还发布了一个扩展的开源多语言语料库,包含3.1万亿token,并在此数据集上训练了四个RWKV-7模型,参数规模从1.9亿到29亿不等。 为了促进开放性、可复现性和采用率,我们在https://huggingface.co/RWKV发布了模型及数据集组件清单,并在https://github.com/RWKV/RWKV-LM公开了训练与推理代码,所有内容均遵循Apache 2.0许可证。
English
We present RWKV-7 "Goose", a new sequence modeling architecture, along with pre-trained language models that establish a new state-of-the-art in downstream performance at the 3 billion parameter scale on multilingual tasks, and match current SoTA English language performance despite being trained on dramatically fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only constant memory usage and constant inference time per token. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to TC^0. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

Summary

AI-Generated Summary

PDF14611March 19, 2025