LLaMA Pro:具有區塊擴展功能的漸進式LLaMA
LLaMA Pro: Progressive LLaMA with Block Expansion
January 4, 2024
作者: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan
cs.AI
摘要
人類通常在獲得新技能的同時不會放棄舊技能;然而,對於大型語言模型(LLMs)則恰恰相反,例如從LLaMA到CodeLLaMA。為此,我們提出了一種針對LLMs的新後預訓練方法,其中包括擴展Transformer blocks。我們僅使用新語料庫調整擴展的blocks,有效地提升模型的知識,而不會出現災難性遺忘。在本文中,我們在代碼和數學語料庫上進行實驗,得到了LLaMA Pro-8.3B,這是一個多才多藝的基礎模型,從LLaMA2-7B初始化而來,在一般任務、編程和數學方面表現出色。LLaMA Pro及其遵循指令的對應模型(LLaMA Pro-Instruct)在各種基準測試中取得了先進的表現,顯示出在LLaMA系列現有開放模型中以及作為智能代理進行推理和處理各種任務的巨大潛力。我們的研究結果為整合自然語言和編程語言提供了寶貴見解,為開發能夠在各種環境中有效運作的先進語言代理奠定了堅實基礎。
English
Humans generally acquire new skills without compromising the old; however,
the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to
CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with
an expansion of Transformer blocks. We tune the expanded blocks using only new
corpus, efficiently and effectively improving the model's knowledge without
catastrophic forgetting. In this paper, we experiment on the corpus of code and
math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from
LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro
and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced
performance among various benchmarks, demonstrating superiority over existing
open models in the LLaMA family and the immense potential of reasoning and
addressing diverse tasks as an intelligent agent. Our findings provide valuable
insights into integrating natural and programming languages, laying a solid
foundation for developing advanced language agents that operate effectively in
various environments.