ChatPaper.aiChatPaper

無輔助損失的混合專家負載平衡策略

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

August 28, 2024
作者: Lean Wang, Huazuo Gao, Chenggang Zhao, Xu Sun, Damai Dai
cs.AI

摘要

對於混合專家(Mixture-of-Experts,MoE)模型,專家負載不平衡將導致路由崩潰或增加計算開銷。現有方法通常採用輔助損失來鼓勵負載平衡,但過大的輔助損失將在訓練中引入不可忽視的干擾梯度,從而損害模型性能。為了在控制負載平衡的同時不產生訓練中的不良梯度,我們提出了Loss-Free Balancing,其特點是一種無輔助損失的負載平衡策略。具體而言,在頂部K路由決策之前,Loss-Free Balancing將首先對每個專家的路由分數應用專家智能偏差。通過根據其最近的負載動態更新每個專家的偏差,Loss-Free Balancing可以始終保持專家負載的平衡分佈。此外,由於Loss-Free Balancing不會產生任何干擾梯度,它還提高了從MoE訓練中獲得的模型性能上限。我們在具有高達3B參數的MoE模型上驗證了Loss-Free Balancing的性能,這些模型訓練了高達200B標記。實驗結果表明,與傳統的輔助損失控制負載平衡策略相比,Loss-Free Balancing實現了更好的性能和更好的負載平衡。
English
For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients during training, we propose Loss-Free Balancing, featured by an auxiliary-loss-free load balancing strategy. To be specific, before the top-K routing decision, Loss-Free Balancing will first apply an expert-wise bias to the routing scores of each expert. By dynamically updating the bias of each expert according to its recent load, Loss-Free Balancing can consistently maintain a balanced distribution of expert load. In addition, since Loss-Free Balancing does not produce any interference gradients, it also elevates the upper bound of model performance gained from MoE training. We validate the performance of Loss-Free Balancing on MoE models with up to 3B parameters trained on up to 200B tokens. Experimental results show that Loss-Free Balancing achieves both better performance and better load balance compared with traditional auxiliary-loss-controlled load balancing strategies.

Summary

AI-Generated Summary

PDF123November 16, 2024