ChatPaper.aiChatPaper

OpenMoE:開放式專家混合語言模型的早期努力

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

January 29, 2024
作者: Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You
cs.AI

摘要

為了幫助開源社區更好地理解基於專家混合(MoE)的大型語言模型(LLMs),我們訓練並發布了OpenMoE,這是一系列完全開源且可復現的僅解碼器MoE LLMs,參數範圍從650M到34B,訓練樣本數達到超過1T。我們的研究證實,基於MoE的LLMs可以提供比密集LLMs更有利的成本效益折衷,突顯了未來LLM發展的潛在效果。 此研究的另一重要貢獻是對我們OpenMoE模型內部路由機制的深入分析,得出三個重要發現:上下文獨立專業化、早期路由學習和朝末端丟棄。我們發現,MoE模型中的路由決策主要基於標記ID,與上下文相關性極小。標記到專家的分配在預訓練階段早期確定並基本保持不變。這種不完善的路由可能導致性能下降,特別是在多輪對話等順序任務中,後續出現的標記更有可能被丟棄。 最後,我們根據上述觀察和分析重新思考我們的設計。為了促進未來MoE LLM的發展,我們提出了潛在的策略,以減輕我們發現的問題並進一步改進現成的MoE LLM設計。
English
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens. Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs, highlighting the potential effectiveness for future LLM development. One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. We discovered that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance. The token-to-expert assignments are determined early in the pre-training phase and remain largely unchanged. This imperfect routing can result in performance degradation, particularly in sequential tasks like multi-turn conversations, where tokens appearing later in a sequence are more likely to be dropped. Finally, we rethink our design based on the above-mentioned observations and analysis. To facilitate future MoE LLM development, we propose potential strategies for mitigating the issues we found and further improving off-the-shelf MoE LLM designs.
PDF294December 15, 2024