ChatPaper.aiChatPaper

OLMoE:开放式专家混合语言模型

OLMoE: Open Mixture-of-Experts Language Models

September 3, 2024
作者: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
cs.AI

摘要

我们介绍了OLMoE,这是一个充分开放且最先进的语言模型,利用稀疏的专家混合模型(MoE)。OLMoE-1B-7B拥有70亿(B)参数,但每个输入标记仅使用10亿。我们在5000亿标记上对其进行预训练,并进一步调整以创建OLMoE-1B-7B-Instruct。我们的模型胜过所有具有类似活跃参数的现有模型,甚至超越像Llama2-13B-Chat和DeepSeekMoE-16B这样更大的模型。我们展示了关于MoE训练的各种实验,分析了我们模型中显示高度专业化的路由,并开源了我们工作的所有方面:模型权重、训练数据、代码和日志。
English
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

Summary

AI-Generated Summary

PDF804November 16, 2024