Aquila2 技术报告
Aquila2 Technical Report
August 14, 2024
作者: Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu
cs.AI
摘要
本文介绍了Aquila2系列,包括参数大小为70、34和7亿的多语言模型。这些模型是基于一种名为HeuriMentor(HM)的创新框架进行训练的,该框架提供了关于模型收敛的实时见解,增强了训练过程和数据管理。HM系统包括自适应训练引擎(ATE)、训练状态监视器(TSM)和数据管理单元(DMU),可以精确监控模型的训练进度,实现数据分发的高效优化,从而提高训练效果。广泛的评估显示,Aquila2模型系列在英文和中文基准测试中表现出色。具体而言,Aquila2-34B在量化为Int4时仅表现出轻微性能下降。此外,我们已经公开发布了我们的训练代码(https://github.com/FlagOpen/FlagScale)和模型权重(https://github.com/FlagAI-Open/Aquila2),以支持正在进行的研究和应用程序开发。
English
This paper introduces the Aquila2 series, which comprises a wide range of
bilingual models with parameter sizes of 7, 34, and 70 billion. These models
are trained based on an innovative framework named HeuriMentor (HM), which
offers real-time insights into model convergence and enhances the training
process and data management. The HM System, comprising the Adaptive Training
Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU),
allows for precise monitoring of the model's training progress and enables
efficient optimization of data distribution, thereby enhancing training
effectiveness. Extensive evaluations show that the Aquila2 model series
performs comparably well on both English and Chinese benchmarks. Specifically,
Aquila2-34B demonstrates only a slight decrease in performance when quantized
to Int4. Furthermore, we have made our training code
(https://github.com/FlagOpen/FlagScale) and model weights
(https://github.com/FlagAI-Open/Aquila2) publicly available to support ongoing
research and the development of applications.Summary
AI-Generated Summary