Ziya2:資料中心學習是所有LLM所需的。
Ziya2: Data-centric Learning is All LLMs Need
November 6, 2023
作者: Ruyi Gan, Ziwei Wu, Renliang Sun, Junyu Lu, Xiaojun Wu, Dixiang Zhang, Kunhao Pan, Ping Yang, Qi Yang, Jiaxing Zhang, Yan Song
cs.AI
摘要
近年來提出了各種大型語言模型(LLMs),包括封閉和開源模型,不斷在多個基準測試中創下新紀錄。然而,LLMs 的發展仍面臨一些問題,例如從頭開始訓練模型的高成本,以及持續的預訓練導致災難性遺忘等。儘管許多這類問題在 LLMS 研究中得到解決,但一個重要且實際的限制是,許多研究過於追求擴大模型規模,而沒有全面分析和優化在學習過程中使用預訓練數據的方法,以及在成本效益設置下訓練 LLMS 時適當組織和利用這些數據。在這項工作中,我們提出了 Ziya2,這是一個擁有 130 億參數的模型,採用 LLaMA2 作為基礎模型,並在 7000 億令牌上進行進一步的預訓練,我們專注於預訓練技術,並使用以數據為中心的優化來增強 Ziya2 在不同階段的學習過程。實驗表明,Ziya2 在多個基準測試中明顯優於其他模型,特別是與代表性的開源模型相比具有令人期待的結果。Ziya2(基礎版)已在以下網址釋出:https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base 和 https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary。
English
Various large language models (LLMs) have been proposed in recent years,
including closed- and open-source ones, continually setting new records on
multiple benchmarks. However, the development of LLMs still faces several
issues, such as high cost of training models from scratch, and continual
pre-training leading to catastrophic forgetting, etc. Although many such issues
are addressed along the line of research on LLMs, an important yet practical
limitation is that many studies overly pursue enlarging model sizes without
comprehensively analyzing and optimizing the use of pre-training data in their
learning process, as well as appropriate organization and leveraging of such
data in training LLMs under cost-effective settings. In this work, we propose
Ziya2, a model with 13 billion parameters adopting LLaMA2 as the foundation
model, and further pre-trained on 700 billion tokens, where we focus on
pre-training techniques and use data-centric optimization to enhance the
learning process of Ziya2 on different stages. Experiments show that Ziya2
significantly outperforms other models in multiple benchmarks especially with
promising results compared to representative open-source ones. Ziya2 (Base) is
released at https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base and
https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary.