ChatPaper.aiChatPaper

Steel-LLM:從零到開源 —— 在構建以中文為中心的LLM中的個人旅程

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

February 10, 2025
作者: Qingshui Gu, Shu Li, Tianyu Zheng, Zhaoxiang Zhang
cs.AI

摘要

Steel-LLM 是一個以中文為中心的語言模型,從頭開發,旨在創建高質量、開源模型,儘管計算資源有限。該項目於 2024 年 3 月推出,旨在在大規模數據集上訓練一個 10 億參數的模型,優先考慮透明度並分享實用見解,以幫助社區中的其他人。訓練過程主要聚焦於中文數據,並包含少量英文數據,填補現有開源語言模型的空白,提供更詳盡和實用的模型構建過程描述。Steel-LLM 在 CEVAL 和 CMMLU 等基準測試中展現了競爭力,優於來自大型機構的早期模型。本文全面總結了該項目的主要貢獻,包括數據收集、模型設計、訓練方法以及沿途遇到的挑戰,為希望開發自己的語言模型的研究人員和從業者提供了寶貴資源。模型檢查點和訓練腳本可在 https://github.com/zhanshijinwat/Steel-LLM 上找到。
English
Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and the challenges encountered along the way, offering a valuable resource for researchers and practitioners looking to develop their own LLMs. The model checkpoints and training script are available at https://github.com/zhanshijinwat/Steel-LLM.

Summary

AI-Generated Summary

PDF42February 11, 2025