Yuan 2.0-M32: アテンションルーターを備えたエキスパートの混合モデル

要旨

Yuan 2.0-M32は、Yuan-2.0 2Bと同様の基本アーキテクチャを採用し、32のエキスパートのうち2つがアクティブとなるMixture-of-Experts（MoE）アーキテクチャを使用しています。新たに提案されたルーターネットワーク「Attention Router」を採用し、エキスパートの選択をより効率的に行うことで、従来のルーターネットワークを使用したモデルと比較して3.8%の精度向上を実現しています。Yuan 2.0-M32は、2000Bトークンを用いてゼロから学習され、学習時の計算コストは同じパラメータ規模の密なモデルのわずか9.25%です。Yuan 2.0-M32は、総パラメータ数40Bのうちアクティブなパラメータが3.7B、トークンあたりの順方向計算量が7.4 GFlopsと、Llama3-70Bの1/19でありながら、コーディング、数学、およびさまざまな専門分野で競争力のある能力を発揮します。特に、MATHおよびARC-Challengeベンチマークでは、それぞれ55.89と95.8の精度を達成し、Llama3-70Bを上回りました。Yuan 2.0-M32のモデルとソースコードはGitHubで公開されています。

English

Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which boosts the accuracy of 3.8% compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github.

Yuan 2.0-M32: アテンションルーターを備えたエキスパートの混合モデル

Yuan 2.0-M32: Mixture of Experts with Attention Router

要旨

Support