ChatPaper.aiChatPaper

OpenMoE:开放混合专家语言模型的早期尝试

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

January 29, 2024
作者: Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You
cs.AI

摘要

为了帮助开源社区更好地理解基于专家混合(MoE)的大型语言模型(LLMs),我们训练并发布了OpenMoE,一系列完全开源且可复现的仅解码器MoE LLMs,参数范围从650M到34B,并在超过1T的标记上进行了训练。我们的研究证实,基于MoE的LLMs可以提供比密集LLMs更有利的成本效益权衡,突显了未来LLM发展的潜在有效性。 本研究的另一个重要贡献是对我们的OpenMoE模型内路由机制的深入分析,得出了三个重要发现:上下文无关专业化、早期路由学习和朝末端丢弃。我们发现,MoE模型中的路由决策主要基于标记ID,与上下文关联性很小。标记与专家的分配在预训练阶段早期确定,并基本保持不变。这种不完善的路由可能导致性能下降,特别是在顺序任务(如多轮对话)中,后续出现的标记更有可能被丢弃。 最后,我们根据上述观察和分析重新思考了我们的设计。为了促进未来MoE LLM的发展,我们提出了缓解我们发现的问题并进一步改进现成MoE LLM设计的潜在策略。
English
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs, highlighting the potential effectiveness for future LLM development. One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. We discovered that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance. The token-to-expert assignments are determined early in the pre-training phase and remain largely unchanged. This imperfect routing can result in performance degradation, particularly in sequential tasks like multi-turn conversations, where tokens appearing later in a sequence are more likely to be dropped. Finally, we rethink our design based on the above-mentioned observations and analysis. To facilitate future MoE LLM development, we propose potential strategies for mitigating the issues we found and further improving off-the-shelf MoE LLM designs.
PDF294December 15, 2024