SUTRA:可擴展多語言語言模型架構
SUTRA: Scalable Multilingual Language Model Architecture
May 7, 2024
作者: Abhijit Bendale, Michael Sapienza, Steven Ripplinger, Simon Gibbs, Jaewon Lee, Pranav Mistry
cs.AI
摘要
本文介紹了 SUTRA,一種多語言大型語言模型架構,能夠理解、推理和生成超過 50 種語言的文本。SUTRA 的設計獨特地將核心概念理解與特定語言處理解耦,從而促進可擴展和高效的多語言對齊和學習。在語言和概念處理中採用專家混合模型框架,SUTRA 展示了計算效率和響應性。通過廣泛的評估,證明 SUTRA 在領先的大規模多任務語言理解(MMLU)基準測試中比現有模型如 GPT-3.5、Llama2 表現提高了 20-30%。SUTRA 模型也是在線的大型語言模型,可以利用網絡知識提供無幻覺、真實且最新的回應,同時保留其多語言能力。此外,我們探討了其架構對未來多語言人工智慧的更廣泛影響,突顯了其潛力在全球民眾中實現人工智慧技術的民主化,並改善在主要使用非英語的地區中人工智慧的公平性和實用性。我們的研究結果表明,SUTRA 不僅填補了多語言模型能力方面的關鍵空白,還為人工智慧應用中的運營效率和可擴展性建立了新的基準。
English
In this paper, we introduce SUTRA, multilingual Large Language Model
architecture capable of understanding, reasoning, and generating text in over
50 languages. SUTRA's design uniquely decouples core conceptual understanding
from language-specific processing, which facilitates scalable and efficient
multilingual alignment and learning. Employing a Mixture of Experts framework
both in language and concept processing, SUTRA demonstrates both computational
efficiency and responsiveness. Through extensive evaluations, SUTRA is
demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on
leading Massive Multitask Language Understanding (MMLU) benchmarks for
multilingual tasks. SUTRA models are also online LLMs that can use knowledge
from the internet to provide hallucination-free, factual and up-to-date
responses while retaining their multilingual capabilities. Furthermore, we
explore the broader implications of its architecture for the future of
multilingual AI, highlighting its potential to democratize access to AI
technology globally and to improve the equity and utility of AI in regions with
predominantly non-English languages. Our findings suggest that SUTRA not only
fills pivotal gaps in multilingual model capabilities but also establishes a
new benchmark for operational efficiency and scalability in AI applications.Summary
AI-Generated Summary