ChatPaper.aiChatPaper

Aya 模型:一個經過微調的開放存取多語言語言模型

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

February 12, 2024
作者: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker
cs.AI

摘要

最近在大型語言模型(LLMs)方面取得的突破集中在少數數據豐富的語言上。如何擴大突破性成果的使用範圍,超越第一類語言呢?我們的研究引入了Aya,一個大規模多語言生成語言模型,可以遵循101種語言的指令,其中超過50%被視為資源較少。Aya在大多數任務上表現優於mT0和BLOOMZ,同時涵蓋了兩倍數量的語言。我們引入了廣泛的新評估套件,擴展了跨99種語言的多語言評估的最新技術,包括區分性和生成性任務、人類評估以及模擬勝率,涵蓋了留存任務和分發性能。此外,我們對最佳微調混合組成、數據修剪以及模型的毒性、偏見和安全性進行了詳細調查。我們在https://hf.co/CohereForAI/aya-101上開源我們的指令數據集和模型。
English
Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages -- including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models. We open-source our instruction datasets and our model at https://hf.co/CohereForAI/aya-101
PDF492December 15, 2024