FLM-101B：一种开放式LLM及如何用10万美元预算对其进行训练

摘要

大型语言模型（LLMs）在自然语言处理和多模态任务中取得了显著成功。尽管取得了这些成功，但它们的发展面临两个主要挑战：（i）高计算成本；和（ii）难以进行公平客观的评估。LLMs的成本极高，只有少数主要参与者才能承担它们的训练，从而限制了研究和应用机会。这凸显了成本效益的LLM训练的重要性。在本文中，我们利用一种增长策略来显著降低LLM训练成本。我们展示了一个具有101B参数和0.31TB标记的LLM可以在10万预算下进行训练。我们还采用了系统化评估范式来评估LLMs的智商，以补充现有更注重知识能力的评估。我们引入了我们的基准，其中包括对智能的重要方面进行评估，包括符号映射、规则理解、模式挖掘和抗干扰。这些评估最大程度地减少了记忆的潜在影响。实验结果表明，我们的模型FLM-101B，在10万预算下训练，与强大且知名的模型（如GPT-3和GLM-130B）在IQ基准评估中取得了可比的性能，特别是在训练数据中未见过的上下文中。FLM-101B的检查点将在https://huggingface.co/CofeAI/FLM-101B上开源。

English

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks. Despite these successes, their development faces two main challenges: (i) high computational cost; and (ii) difficulty in conducting fair and objective evaluations. LLMs are prohibitively expensive, making it feasible for only a few major players to undertake their training, thereby constraining both research and application opportunities. This underscores the importance of cost-effective LLM training. In this paper, we utilize a growth strategy to significantly reduce LLM training cost. We demonstrate that an LLM with 101B parameters and 0.31TB tokens can be trained on a 100K budget. We also adopt a systematic evaluation paradigm for the IQ evaluation of LLMs, in complement to existing evaluations that focus more on knowledge-oriented abilities. We introduce our benchmark including evaluations on important aspects of intelligence including symbolic mapping, itrule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model FLM-101B, trained with a budget of 100K, achieves comparable performance to powerful and well-known models, eg GPT-3 and GLM-130B, especially in the IQ benchmark evaluations with contexts unseen in training data. The checkpoint of FLM-101B will be open-sourced at https://huggingface.co/CofeAI/FLM-101B.

FLM-101B：一种开放式LLM及如何用10万美元预算对其进行训练

FLM-101B: An Open LLM and How to Train It with $100K Budget

摘要

Support