이글과 핀치: 행렬 값 상태와 동적 순환을 갖는 RWKV

초록

우리는 RWKV(RWKV-4) 아키텍처를 개선한 시퀀스 모델인 Eagle(RWKV-5)과 Finch(RWKV-6)를 소개합니다. 우리의 아키텍처 설계 개선 사항에는 다중 헤드 행렬 값 상태와 동적 재귀 메커니즘이 포함되어 있으며, 이는 RNN의 추론 효율성 특성을 유지하면서 표현력을 향상시킵니다. 우리는 1.12조 토큰으로 구성된 새로운 다국어 코퍼스와 탐욕적 매칭 기반의 빠른 토크나이저를 도입하여 다국어 처리 능력을 강화했습니다. 우리는 0.46억에서 75억 파라미터까지의 네 가지 Eagle 모델과 16억 및 31억 파라미터의 두 가지 Finch 모델을 학습시켰으며, 이들이 다양한 벤치마크에서 경쟁력 있는 성능을 달성함을 확인했습니다. 우리는 모든 모델을 Apache 2.0 라이선스 하에 HuggingFace에 공개합니다. 모델은 https://huggingface.co/RWKV에서 확인할 수 있으며, 학습 코드는 https://github.com/RWKV/RWKV-LM에서, 추론 코드는 https://github.com/RWKV/ChatRWKV에서, 시간 병렬 학습 코드는 https://github.com/RWKV/RWKV-infctx-trainer에서 확인할 수 있습니다.

English

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer

이글과 핀치: 행렬 값 상태와 동적 순환을 갖는 RWKV

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

초록

Summary

Support

Support