JetMoE: 0.1MドルでLlama2の性能を達成

要旨

大規模言語モデル（LLM）は目覚ましい成果を上げてきたが、そのリソース需要の増大は、強力でアクセス可能な超人的知能の開発における主要な障害となっている。本報告書では、100万ドル未満のコストで訓練された新しいLLMであるJetMoE-8Bを紹介する。このモデルは、慎重に混合されたオープンソースコーパスから1.25兆トークンと、30,000時間のH100 GPUを使用して訓練された。低コストにもかかわらず、JetMoE-8Bは印象的な性能を示し、JetMoE-8BはLlama2-7Bモデルを上回り、JetMoE-8B-ChatはLlama2-13B-Chatモデルを凌駕している。これらの結果は、LLMの訓練が一般に考えられているよりもはるかにコスト効率的である可能性を示唆している。JetMoE-8Bは、効率的なスパースゲート型Mixture-of-Experts（SMoE）アーキテクチャに基づいており、アテンションとフィードフォワードのエキスパートで構成されている。両層はスパースに活性化されるため、JetMoE-8Bは80億のパラメータを持ちながら、各入力トークンに対して20億のパラメータのみを活性化し、Llama2-7Bと比較して推論計算を約70％削減する。さらに、JetMoE-8Bは非常にオープンで学術界に優しいモデルであり、公開データセットと訓練コードのみを使用している。すべての訓練パラメータとデータ混合は、今後のオープン基盤モデルの開発を促進するために、本報告書で詳細に説明されている。この透明性は、アクセス可能で効率的なLLMの分野における協力とさらなる進歩を奨励することを目的としている。モデルの重みはhttps://github.com/myshell-ai/JetMoEで公開されている。

English

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE-8B demonstrates impressive performance, with JetMoE-8B outperforming the Llama2-7B model and JetMoE-8B-Chat surpassing the Llama2-13B-Chat model. These results suggest that LLM training can be much more cost-effective than generally thought. JetMoE-8B is based on an efficient Sparsely-gated Mixture-of-Experts (SMoE) architecture, composed of attention and feedforward experts. Both layers are sparsely activated, allowing JetMoE-8B to have 8B parameters while only activating 2B for each input token, reducing inference computation by about 70% compared to Llama2-7B. Moreover, JetMoE-8B is highly open and academia-friendly, using only public datasets and training code. All training parameters and data mixtures have been detailed in this report to facilitate future efforts in the development of open foundation models. This transparency aims to encourage collaboration and further advancements in the field of accessible and efficient LLMs. The model weights are publicly available at https://github.com/myshell-ai/JetMoE.

JetMoE: 0.1MドルでLlama2の性能を達成

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

要旨

Support