Eagle e Finch: RWKV con Stati a Valori Matriciali e Ricorrenza Dinamica

Abstract

Presentiamo Eagle (RWKV-5) e Finch (RWKV-6), modelli di sequenza che migliorano l'architettura RWKV (RWKV-4). I nostri progressi nel design architetturale includono stati a matrice multi-testata e un meccanismo di ricorrenza dinamica che aumentano l'espressività pur mantenendo le caratteristiche di efficienza nell'inferenza tipiche delle RNN. Introduciamo un nuovo corpus multilingue con 1,12 trilioni di token e un tokenizzatore veloce basato su corrispondenza greedy per una migliore multilinguità. Abbiamo addestrato quattro modelli Eagle, con un numero di parametri compreso tra 0,46 e 7,5 miliardi, e due modelli Finch con 1,6 e 3,1 miliardi di parametri, riscontrando che raggiungono prestazioni competitive su un'ampia varietà di benchmark. Rilasciamo tutti i nostri modelli su HuggingFace con licenza Apache 2.0. I modelli sono disponibili su: https://huggingface.co/RWKV Il codice di addestramento su: https://github.com/RWKV/RWKV-LM Il codice di inferenza su: https://github.com/RWKV/ChatRWKV Il codice di addestramento time-parallel su: https://github.com/RWKV/RWKV-infctx-trainer

English

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Eagle e Finch: RWKV con Stati a Valori Matriciali e Ricorrenza Dinamica

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Abstract

Support