FLM-101B: 100Kドル予算で訓練可能なオープンな大規模言語モデル

要旨

大規模言語モデル（LLMs）は、NLPおよびマルチモーダルタスクにおいて顕著な成功を収めています。しかし、これらの成功にもかかわらず、その開発には2つの主要な課題があります：(i) 高い計算コスト、(ii) 公平かつ客観的な評価の難しさです。LLMsは非常に高価であり、その訓練を実施できるのは少数の主要プレイヤーに限られるため、研究と応用の機会が制約されています。これは、コスト効率の良いLLM訓練の重要性を強調しています。本論文では、成長戦略を活用してLLM訓練コストを大幅に削減します。101Bパラメータと0.31TBトークンを有するLLMを100Kの予算で訓練できることを実証します。また、既存の知識指向能力に焦点を当てた評価を補完するため、LLMのIQ評価に対する体系的な評価パラダイムを採用します。シンボリックマッピング、ルール理解、パターンマイニング、および干渉耐性といった知能の重要な側面を含む評価を導入します。このような評価は、暗記の潜在的な影響を最小化します。実験結果は、100Kの予算で訓練された我々のモデルFLM-101Bが、特に訓練データに見られない文脈を含むIQベンチマーク評価において、GPT-3やGLM-130Bといった強力で有名なモデルと同等の性能を達成することを示しています。FLM-101Bのチェックポイントはhttps://huggingface.co/CofeAI/FLM-101Bでオープンソース化されます。

English

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks. Despite these successes, their development faces two main challenges: (i) high computational cost; and (ii) difficulty in conducting fair and objective evaluations. LLMs are prohibitively expensive, making it feasible for only a few major players to undertake their training, thereby constraining both research and application opportunities. This underscores the importance of cost-effective LLM training. In this paper, we utilize a growth strategy to significantly reduce LLM training cost. We demonstrate that an LLM with 101B parameters and 0.31TB tokens can be trained on a 100K budget. We also adopt a systematic evaluation paradigm for the IQ evaluation of LLMs, in complement to existing evaluations that focus more on knowledge-oriented abilities. We introduce our benchmark including evaluations on important aspects of intelligence including symbolic mapping, itrule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model FLM-101B, trained with a budget of 100K, achieves comparable performance to powerful and well-known models, eg GPT-3 and GLM-130B, especially in the IQ benchmark evaluations with contexts unseen in training data. The checkpoint of FLM-101B will be open-sourced at https://huggingface.co/CofeAI/FLM-101B.

FLM-101B: 100Kドル予算で訓練可能なオープンな大規模言語モデル

FLM-101B: An Open LLM and How to Train It with $100K Budget

要旨

Support