ChatPaper.aiChatPaper

OpenELM:一个具有开源训练和推理框架的高效语言模型系列

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

April 22, 2024
作者: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari
cs.AI

摘要

大型语言模型的可重现性和透明度对于推动开放研究、确保结果的可信度以及进行数据和模型偏见以及潜在风险的调查至关重要。为此,我们发布了OpenELM,这是一款最先进的开放式语言模型。OpenELM采用逐层缩放策略,在变换器模型的每一层内有效分配参数,从而提高准确性。例如,拥有大约十亿个参数预算的情况下,OpenELM相较于OLMo在准确性上提升了2.36%,同时需要的预训练标记数量减少了2倍。 与以往仅提供模型权重和推理代码,并在私有数据集上进行预训练的做法不同,我们的发布包括了在公开可用数据集上训练和评估语言模型的完整框架,包括训练日志、多个检查点和预训练配置。我们还发布了将模型转换为MLX库以在苹果设备上进行推理和微调的代码。这一全面的发布旨在赋予和加强开放研究社区的能力,为未来的开放研究努力铺平道路。 我们的源代码以及预训练模型权重和训练配方可在https://github.com/apple/corenet 上找到。此外,模型可以在HuggingFace上找到:https://huggingface.co/apple/OpenELM。
English
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2times fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at https://github.com/apple/corenet. Additionally, \model models can be found on HuggingFace at: https://huggingface.co/apple/OpenELM.

Summary

AI-Generated Summary

PDF12814December 15, 2024