タイニー・アヤ：規模と多言語的深みを架橋する

要旨

Tiny Ayaは、小規模多言語言語モデルの可能性を再定義します。70言語で学習され、地域を意識した事後学習によって洗練されたこのモデルは、わずか35億パラメーターで、最先端の翻訳品質、強力な多言語理解、高品質な目標言語生成を実現します。リリースには、事前学習済み基盤モデル、世界的にバランスの取れた指示チューニング版、そしてアフリカ、南アジア、ヨーロッパ、アジア太平洋、西アジアの言語を対象とした3つの地域特化モデルが含まれます。本報告書では、Tiny Ayaの背後にある学習戦略、データ構成、包括的な評価フレームワークを詳述し、効率性、言語間のバランスの取れた性能、実用的な配備を中心とした、多言語AIの新たなスケーリング手法を提案します。

English

Tiny Aya redefines what a small multilingual language model can achieve. Trained on 70 languages and refined through region-aware posttraining, it delivers state-of-the-art in translation quality, strong multilingual understanding, and high-quality target-language generation, all with just 3.35B parameters. The release includes a pretrained foundation model, a globally balanced instruction-tuned variant, and three region-specialized models targeting languages from Africa, South Asia, Europe, Asia-Pacific, and West Asia. This report details the training strategy, data composition, and comprehensive evaluation framework behind Tiny Aya, and presents an alternative scaling path for multilingual AI: one centered on efficiency, balanced performance across languages, and practical deployment.

タイニー・アヤ：規模と多言語的深みを架橋する

Tiny Aya: Bridging Scale and Multilingual Depth

要旨

Support