ChatPaper.aiChatPaper

大型语言模型中用于跨语言迁移学习的动态数据采样器

Dynamic data sampler for cross-language transfer learning in large language models

May 17, 2024
作者: Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou
cs.AI

摘要

大型语言模型(LLMs)由于其广泛的应用领域,在自然语言处理(NLP)领域引起了极大关注。然而,为非英语语言训练LLMs存在重大挑战,主要是由于获取大规模语料库和必要的计算资源的困难。本文提出了ChatFlow,一种基于跨语言转移的LLM,以便以经济高效的方式训练大型中文语言模型来解决这些挑战。我们采用中文、英文和平行语料库的混合,持续训练LLaMA2模型,旨在对齐跨语言表示,并促进知识转移,特别是针对中文语言模型。此外,我们使用动态数据采样器,逐渐将模型从无监督预训练过渡到监督微调。实验结果表明,我们的方法加速了模型收敛,并实现了卓越的性能。我们在流行的中文和英文基准上评估了ChatFlow,结果表明它优于在LLaMA-2-7B上进行后训练的其他中文模型。
English
Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.

Summary

AI-Generated Summary

PDF80December 15, 2024