RWKV-7「天鵝」具備表現力動態狀態演化
RWKV-7 "Goose" with Expressive Dynamic State Evolution
March 18, 2025
作者: Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Haowen Hou, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng
cs.AI
摘要
我們推出RWKV-7「Goose」,這是一種新的序列建模架構,並附帶預訓練語言模型,這些模型在多語言任務上以30億參數規模建立了下游性能的新標杆,並且在英語語言性能上與當前最先進的模型相匹配,儘管其訓練所用的token數量遠少於其他頂尖的30億參數模型。然而,RWKV-7模型僅需恆定的記憶體使用量和每個token的恆定推理時間。RWKV-7引入了一種新泛化的delta規則,具有向量值門控和上下文學習率,以及一種放寬的值替換規則。我們展示RWKV-7能夠進行狀態跟踪並識別所有正則語言,同時保持訓練的並行化能力。這超越了在標準複雜性猜想下僅限於TC^0的Transformer的能力。為了展示RWKV-7的語言建模能力,我們還提供了一個擴展的開源3.1萬億token多語言語料庫,並在此數據集上訓練了四個RWKV-7模型,參數範圍從1.9億到29億。
為了促進開放性、重現性和採用,我們在https://huggingface.co/RWKV發布了我們的模型和數據集組件列表,並在https://github.com/RWKV/RWKV-LM發布了我們的訓練和推理代碼,所有這些均遵循Apache 2.0許可證。
English
We present RWKV-7 "Goose", a new sequence modeling architecture, along with
pre-trained language models that establish a new state-of-the-art in downstream
performance at the 3 billion parameter scale on multilingual tasks, and match
current SoTA English language performance despite being trained on dramatically
fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only
constant memory usage and constant inference time per token. RWKV-7 introduces
a newly generalized formulation of the delta rule with vector-valued gating and
in-context learning rates, as well as a relaxed value replacement rule. We show
that RWKV-7 can perform state tracking and recognize all regular languages,
while retaining parallelizability of training. This exceeds the capabilities of
Transformers under standard complexity conjectures, which are limited to
TC^0. To demonstrate RWKV-7's language modeling capability, we also
present an extended open source 3.1 trillion token multilingual corpus, and
train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on
this dataset.
To foster openness, reproduction, and adoption, we release our models and
dataset component listing at https://huggingface.co/RWKV, and our training and
inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0
License.Summary
AI-Generated Summary