ChatPaper.aiChatPaper

OLMoE:開放式專家混合語言模型

OLMoE: Open Mixture-of-Experts Language Models

September 3, 2024
作者: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi
cs.AI

摘要

我們介紹了OLMoE,這是一個充分開放且最先進的語言模型,利用稀疏的專家混合(MoE)。OLMoE-1B-7B具有70億(B)參數,但每個輸入標記僅使用10億參數。我們對其進行了5000億標記的預訓練,並進一步適應以創建OLMoE-1B-7B-Instruct。我們的模型在具有相似活躍參數的所有可用模型中表現優異,甚至超越了諸如Llama2-13B-Chat和DeepSeekMoE-16B等更大的模型。我們展示了有關MoE訓練的各種實驗,分析了我們模型中的路由,顯示高度專業化,並開源我們工作的所有方面:模型權重、訓練數據、代碼和日誌。
English
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

Summary

AI-Generated Summary

PDF804November 16, 2024