ChatPaper.aiChatPaper

LLaMA Pro:具有块扩展的渐进式LLaMA

LLaMA Pro: Progressive LLaMA with Block Expansion

January 4, 2024
作者: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan
cs.AI

摘要

人类通常在学习新技能的同时不会放弃旧技能;然而,对于大型语言模型(LLMs)来说,情况恰恰相反,例如从LLaMA到CodeLLaMA。为此,我们提出了一种新的后预训练方法,通过扩展Transformer块来为LLMs进行微调。我们仅使用新语料库调整扩展块,从而在不造成灾难性遗忘的情况下,高效有效地提升模型的知识。在本文中,我们在代码和数学语料库上进行实验,得到了LLaMA Pro-8.3B,这是一个多才多艺的基础模型,初始化自LLaMA2-7B,在通用任务、编程和数学方面表现出色。LLaMA Pro及其遵循指令的对应模型(LLaMA Pro-Instruct)在各种基准测试中取得了先进的性能,展示了在LLaMA系列现有开放模型之上的优越性,以及作为智能代理进行推理和处理各种任务的巨大潜力。我们的研究结果为整合自然语言和编程语言提供了宝贵的见解,为开发能够在各种环境中有效运行的先进语言代理奠定了坚实基础。
English
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
PDF543December 15, 2024