ChatPaper.aiChatPaper

OpenELM:具有開源訓練和推理框架的高效語言模型系列

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

April 22, 2024
作者: Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari
cs.AI

摘要

大型語言模型的可重現性和透明度對於推動開放研究、確保結果的可信度以及進行數據和模型偏見以及潛在風險的調查至關重要。為此,我們發布了OpenELM,這是一個最先進的開放式語言模型。OpenELM採用了一種逐層縮放策略,以有效地分配轉換器模型每一層中的參數,從而提高準確性。例如,當參數預算約為十億個參數時,OpenELM的準確性比OLMo提高了2.36%,同時需要的預訓練標記數量少了2倍。 與先前僅提供模型權重和推理代碼並在私有數據集上進行預訓練的做法不同,我們的發布包括了在公開可用數據集上訓練和評估語言模型的完整框架,包括訓練日誌、多個檢查點和預訓練配置。我們還發布了將模型轉換為MLX庫進行推理和在蘋果設備上進行微調的代碼。這一全面的發布旨在賦予和加強開放研究社區的能力,為未來的開放研究努力鋪平道路。 我們的源代碼以及預先訓練的模型權重和訓練配方可在https://github.com/apple/corenet 上找到。此外,\model 模型可以在HuggingFace上找到:https://huggingface.co/apple/OpenELM。
English
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2times fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at https://github.com/apple/corenet. Additionally, \model models can be found on HuggingFace at: https://huggingface.co/apple/OpenELM.

Summary

AI-Generated Summary

PDF12814December 15, 2024