SUTRA:可扩展多语言语言模型架构
SUTRA: Scalable Multilingual Language Model Architecture
May 7, 2024
作者: Abhijit Bendale, Michael Sapienza, Steven Ripplinger, Simon Gibbs, Jaewon Lee, Pranav Mistry
cs.AI
摘要
本文介绍了SUTRA,一种多语言大型语言模型架构,能够理解、推理和生成50多种语言的文本。SUTRA的设计独特地将核心概念理解与特定语言处理分离,从而促进了可扩展和高效的多语言对齐和学习。在语言和概念处理中都采用了专家混合框架,SUTRA展现出了计算效率和响应能力。通过广泛的评估,证明SUTRA在领先的大规模多任务语言理解(MMLU)基准测试中比现有模型如GPT-3.5、Llama2提高了20-30%的多语言任务表现。SUTRA模型也是在线LLM,可以利用互联网知识提供无幻觉、事实和最新回应,同时保持多语言能力。此外,我们探讨了其架构对未来多语言人工智能的广泛影响,强调其潜力在全球范围内使人工智能技术民主化,并改善在主要使用非英语语言的地区中人工智能的公平性和实用性。我们的研究结果表明,SUTRA不仅填补了多语言模型能力的关键空白,还为人工智能应用的运行效率和可扩展性建立了新的基准。
English
In this paper, we introduce SUTRA, multilingual Large Language Model
architecture capable of understanding, reasoning, and generating text in over
50 languages. SUTRA's design uniquely decouples core conceptual understanding
from language-specific processing, which facilitates scalable and efficient
multilingual alignment and learning. Employing a Mixture of Experts framework
both in language and concept processing, SUTRA demonstrates both computational
efficiency and responsiveness. Through extensive evaluations, SUTRA is
demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on
leading Massive Multitask Language Understanding (MMLU) benchmarks for
multilingual tasks. SUTRA models are also online LLMs that can use knowledge
from the internet to provide hallucination-free, factual and up-to-date
responses while retaining their multilingual capabilities. Furthermore, we
explore the broader implications of its architecture for the future of
multilingual AI, highlighting its potential to democratize access to AI
technology globally and to improve the equity and utility of AI in regions with
predominantly non-English languages. Our findings suggest that SUTRA not only
fills pivotal gaps in multilingual model capabilities but also establishes a
new benchmark for operational efficiency and scalability in AI applications.Summary
AI-Generated Summary