Aya 모델: 명령어 미세조정이 적용된 오픈 액세스 다국어 언어 모델

초록

대형 언어 모델(LLM)의 최근 돌파구는 소수의 데이터가 풍부한 언어를 중심으로 이루어졌다. 이러한 돌파구를 '1등 시민 언어' 이상으로 확대하기 위해서는 무엇이 필요한가? 우리의 연구는 Aya를 소개한다. Aya는 101개 언어로 명령을 따르는 대규모 다국어 생성 언어 모델로, 이 중 50% 이상이 저자원 언어로 간주된다. Aya는 대부분의 작업에서 mT0와 BLOOMZ를 능가하면서도 두 배에 가까운 수의 언어를 지원한다. 우리는 99개 언어에 걸친 다국어 평가를 위한 새로운 평가 스위트를 도입하여, 판별적 및 생성적 작업, 인간 평가, 그리고 보류된 작업과 분포 내 성능을 모두 포함하는 시뮬레이션 승률을 통해 최신 기술을 확장했다. 또한, 최적의 미세 조정 혼합 구성, 데이터 정제, 그리고 모델의 독성, 편향, 안전성에 대한 상세한 조사를 수행했다. 우리는 명령 데이터셋과 모델을 https://hf.co/CohereForAI/aya-101에서 오픈소스로 공개한다.

English

Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOMZ on the majority of tasks while covering double the number of languages. We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages -- including discriminative and generative tasks, human evaluation, and simulated win rates that cover both held-out tasks and in-distribution performance. Furthermore, we conduct detailed investigations on the optimal finetuning mixture composition, data pruning, as well as the toxicity, bias, and safety of our models. We open-source our instruction datasets and our model at https://hf.co/CohereForAI/aya-101

Aya 모델: 명령어 미세조정이 적용된 오픈 액세스 다국어 언어 모델

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

초록

Support