ChatPaper.aiChatPaper

鷹與雀:具矩陣狀態和動態循環的RWKV

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

April 8, 2024
作者: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu
cs.AI

摘要

我們提出了Eagle(RWKV-5)和Finch(RWKV-6),這是在RWKV(RWKV-4)架構基礎上進行改進的序列模型。我們的架構設計改進包括多頭矩陣狀態和動態循環機制,提高了表達能力,同時保持了RNN的推理效率特性。我們引入了一個包含1.12兆標記的新多語言語料庫,以及一個基於貪婪匹配的快速分詞器,以增強多語能力。我們訓練了四個Eagle模型,參數範圍從0.46到75億,以及兩個擁有16億和31億參數的Finch模型,發現它們在各種基準測試中取得了競爭性表現。我們在HuggingFace上以Apache 2.0許可證釋出所有模型。模型位於:https://huggingface.co/RWKV 訓練代碼位於:https://github.com/RWKV/RWKV-LM 推理代碼位於:https://github.com/RWKV/ChatRWKV 時間並行訓練代碼位於:https://github.com/RWKV/RWKV-infctx-trainer
English
We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Summary

AI-Generated Summary

PDF391December 15, 2024