ChatPaper.aiChatPaper

Aya 模型:一种经过指导微调的开放获取多语言语言模型

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

February 12, 2024
作者: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker
cs.AI

摘要

最近在大型语言模型(LLMs)领域取得的突破集中在少数数据丰富的语言上。如何扩大突破成果的获取范围,超越头等公民语言?我们的工作引入了Aya,一个大规模多语言生成语言模型,可以遵循101种语言的指令,其中超过50%被认为是资源较少的语言。Aya在大多数任务上的表现优于mT0和BLOOMZ,同时涵盖的语言数量是它们的两倍。我们引入了广泛的新评估套件,扩展了跨99种语言的多语言评估的最新技术,包括区分性和生成性任务、人类评估以及模拟胜率,涵盖了被保留的任务和分布性能。此外,我们对最佳微调混合组成、数据修剪以及模型的毒性、偏见和安全性进行了详细调查。我们将我们的指令数据集和模型开源,网址为https://hf.co/CohereForAI/aya-101。
English
Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages -- including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models. We open-source our instruction datasets and our model at https://hf.co/CohereForAI/aya-101
PDF492December 15, 2024