第23章:開放式權重釋放以推動多語言進展
Aya 23: Open Weight Releases to Further Multilingual Progress
May 23, 2024
作者: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Kelly Marchisio, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker
cs.AI
摘要
本技術報告介紹Aya 23,這是一個多語言語言模型系列。Aya 23基於最近釋出的Aya模型(\"Ust\"un等人,2024年),著重於將高性能的預訓練模型與最近釋出的Aya收藏(Singh等人,2024年)相結合。結果是一個功能強大的多語言大型語言模型,支援23種語言,將最先進的語言建模能力擴展到全球約一半的人口。Aya模型覆蓋了101種語言,而Aya 23則是一個深度與廣度的實驗,探索在預訓練期間將更多容量分配給較少的語言對效果的影響。Aya 23在其覆蓋的語言上表現優於先前的大規模多語言模型,如Aya 101,以及廣泛使用的模型,如Gemma、Mistral和Mixtral,在廣泛範圍的區分性和生成性任務上。我們釋出8B和35B模型的開放權重,作為我們持續致力擴大多語言進展可及性的一部分。
English
This technical report introduces Aya 23, a family of multilingual language
models. Aya 23 builds on the recent release of the Aya model (\"Ust\"un et al.,
2024), focusing on pairing a highly performant pre-trained model with the
recently released Aya collection (Singh et al., 2024). The result is a powerful
multilingual large language model serving 23 languages, expanding state-of-art
language modeling capabilities to approximately half of the world's population.
The Aya model covered 101 languages whereas Aya 23 is an experiment in depth vs
breadth, exploring the impact of allocating more capacity to fewer languages
that are included during pre-training. Aya 23 outperforms both previous
massively multilingual models like Aya 101 for the languages it covers, as well
as widely used models like Gemma, Mistral and Mixtral on an extensive range of
discriminative and generative tasks. We release the open weights for both the
8B and 35B models as part of our continued commitment for expanding access to
multilingual progress.Summary
AI-Generated Summary