ChatPaper.aiChatPaper

Aquila2 技術報告

Aquila2 Technical Report

August 14, 2024
作者: Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu
cs.AI

摘要

本文介紹了Aquila2系列,包括具有70、34和7十億參數大小的多種雙語模型。這些模型是基於一個名為HeuriMentor(HM)的創新框架進行訓練的,該框架提供了關於模型收斂的實時見解,增強了訓練過程和數據管理。HM系統包括自適應訓練引擎(ATE)、訓練狀態監控器(TSM)和數據管理單元(DMU),可精確監控模型的訓練進度,實現數據分發的高效優化,從而提高訓練效果。廣泛的評估顯示,Aquila2模型系列在英文和中文基準測試上表現出色。具體來說,當量化為Int4時,Aquila2-34B僅表現出輕微的性能下降。此外,我們已經將我們的訓練代碼(https://github.com/FlagOpen/FlagScale)和模型權重(https://github.com/FlagAI-Open/Aquila2)公開提供,以支持持續的研究和應用程序開發。
English
This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU), allows for precise monitoring of the model's training progress and enables efficient optimization of data distribution, thereby enhancing training effectiveness. Extensive evaluations show that the Aquila2 model series performs comparably well on both English and Chinese benchmarks. Specifically, Aquila2-34B demonstrates only a slight decrease in performance when quantized to Int4. Furthermore, we have made our training code (https://github.com/FlagOpen/FlagScale) and model weights (https://github.com/FlagAI-Open/Aquila2) publicly available to support ongoing research and the development of applications.

Summary

AI-Generated Summary

PDF152November 28, 2024