鹰与雀:具有矩阵值状态和动态循环的RWKV
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
April 8, 2024
作者: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu
cs.AI
摘要
我们提出了Eagle(RWKV-5)和Finch(RWKV-6),这是在RWKV(RWKV-4)架构基础上改进的序列模型。我们的架构设计创新包括多头矩阵值状态和动态循环机制,提高了表达能力,同时保持了RNN的推理效率特性。我们引入了一个包含1.12万亿标记的新多语言语料库,并基于贪婪匹配的快速分词器,以增强多语言能力。我们训练了四个Eagle模型,参数范围从0.46到75亿,以及两个拥有16亿和31亿参数的Finch模型,并发现它们在各种基准测试中取得了竞争性能。我们在HuggingFace上以Apache 2.0许可证发布了所有模型。模型链接:
https://huggingface.co/RWKV 训练代码链接:https://github.com/RWKV/RWKV-LM 推理代码链接:https://github.com/RWKV/ChatRWKV 时间并行训练代码链接:https://github.com/RWKV/RWKV-infctx-trainer
English
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon
the RWKV (RWKV-4) architecture. Our architectural design advancements include
multi-headed matrix-valued states and a dynamic recurrence mechanism that
improve expressivity while maintaining the inference efficiency characteristics
of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a
fast tokenizer based on greedy matching for enhanced multilinguality. We
trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two
Finch models with 1.6 and 3.1 billion parameters and find that they achieve
competitive performance across a wide variety of benchmarks. We release all our
models on HuggingFace under the Apache 2.0 license. Models at:
https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM
Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code
at: https://github.com/RWKV/RWKV-infctx-trainerSummary
AI-Generated Summary