第23章:开放权重释放以推动多语言进展
Aya 23: Open Weight Releases to Further Multilingual Progress
May 23, 2024
作者: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Kelly Marchisio, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker
cs.AI
摘要
本技术报告介绍了Aya 23,这是一系列多语言语言模型。Aya 23基于最近发布的Aya模型(Ustun等,2024年),重点是将高性能的预训练模型与最近发布的Aya集合(Singh等,2024年)配对。其结果是一个功能强大的多语言大型语言模型,覆盖23种语言,将最先进的语言建模能力扩展到全球约一半的人口。Aya模型覆盖了101种语言,而Aya 23是一项深度与广度的实验,探索在预训练期间为少数语言分配更多容量的影响。Aya 23在其覆盖的语言上表现优于先前的大规模多语言模型,如Aya 101,以及广泛使用的模型,如Gemma、Mistral和Mixtral,在广泛的区分性和生成性任务上。我们发布了8B和35B模型的开放权重,作为我们持续承诺扩大多语言进展获取的一部分。
English
This technical report introduces Aya 23, a family of multilingual language
models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al.,
2024), focusing on pairing a highly performant pre-trained model with the
recently released Aya collection (Singh et al., 2024). The result is a powerful
multilingual large language model serving 23 languages, expanding state-of-art
language modeling capabilities to approximately half of the world's population.
The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs
breadth, exploring the impact of allocating more capacity to fewer languages
that are included during pre-training. Aya 23 outperforms both previous
massively multilingual models like Aya 101 for the languages it covers, as well
as widely used models like Gemma, Mistral and Mixtral on an extensive range of
discriminative and generative tasks. We release the open weights for both the
8B and 35B models as part of our continued commitment for expanding access to
multilingual progress.