Eagle y Finch: RWKV con Estados de Valor Matricial y Recurrencia Dinámica

Resumen

Presentamos Eagle (RWKV-5) y Finch (RWKV-6), modelos de secuencia que mejoran la arquitectura RWKV (RWKV-4). Nuestros avances en el diseño arquitectónico incluyen estados matriciales de múltiples cabezas y un mecanismo de recurrencia dinámica que mejoran la expresividad mientras mantienen las características de eficiencia en inferencia de las RNN. Introducimos un nuevo corpus multilingüe con 1.12 billones de tokens y un tokenizador rápido basado en emparejamiento voraz para mejorar la multilingüidad. Entrenamos cuatro modelos Eagle, que van desde 0.46 hasta 7.5 mil millones de parámetros, y dos modelos Finch con 1.6 y 3.1 mil millones de parámetros, y encontramos que logran un rendimiento competitivo en una amplia variedad de benchmarks. Publicamos todos nuestros modelos en HuggingFace bajo la licencia Apache 2.0. Modelos en: https://huggingface.co/RWKV Código de entrenamiento en: https://github.com/RWKV/RWKV-LM Código de inferencia en: https://github.com/RWKV/ChatRWKV Código de entrenamiento en paralelo en tiempo en: https://github.com/RWKV/RWKV-infctx-trainer

English

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

Eagle y Finch: RWKV con Estados de Valor Matricial y Recurrencia Dinámica

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Resumen

Support