LLaMA Pro:具有块扩展的渐进式LLaMA
LLaMA Pro: Progressive LLaMA with Block Expansion
January 4, 2024
作者: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan
cs.AI
摘要
人类通常在学习新技能的同时不会放弃旧技能;然而,对于大型语言模型(LLMs)来说,情况恰恰相反,例如从LLaMA到CodeLLaMA。为此,我们提出了一种新的后预训练方法,通过扩展Transformer块来为LLMs进行微调。我们仅使用新语料库调整扩展块,从而在不造成灾难性遗忘的情况下,高效有效地提升模型的知识。在本文中,我们在代码和数学语料库上进行实验,得到了LLaMA Pro-8.3B,这是一个多才多艺的基础模型,初始化自LLaMA2-7B,在通用任务、编程和数学方面表现出色。LLaMA Pro及其遵循指令的对应模型(LLaMA Pro-Instruct)在各种基准测试中取得了先进的性能,展示了在LLaMA系列现有开放模型之上的优越性,以及作为智能代理进行推理和处理各种任务的巨大潜力。我们的研究结果为整合自然语言和编程语言提供了宝贵的见解,为开发能够在各种环境中有效运行的先进语言代理奠定了坚实基础。
English
Humans generally acquire new skills without compromising the old; however,
the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to
CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with
an expansion of Transformer blocks. We tune the expanded blocks using only new
corpus, efficiently and effectively improving the model's knowledge without
catastrophic forgetting. In this paper, we experiment on the corpus of code and
math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from
LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro
and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced
performance among various benchmarks, demonstrating superiority over existing
open models in the LLaMA family and the immense potential of reasoning and
addressing diverse tasks as an intelligent agent. Our findings provide valuable
insights into integrating natural and programming languages, laying a solid
foundation for developing advanced language agents that operate effectively in
various environments.