ChatPaper.aiChatPaper

LLaMA Pro:具有區塊擴展功能的漸進式LLaMA

LLaMA Pro: Progressive LLaMA with Block Expansion

January 4, 2024
作者: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan
cs.AI

摘要

人類通常在獲得新技能的同時不會放棄舊技能;然而,對於大型語言模型(LLMs)則恰恰相反,例如從LLaMA到CodeLLaMA。為此,我們提出了一種針對LLMs的新後預訓練方法,其中包括擴展Transformer blocks。我們僅使用新語料庫調整擴展的blocks,有效地提升模型的知識,而不會出現災難性遺忘。在本文中,我們在代碼和數學語料庫上進行實驗,得到了LLaMA Pro-8.3B,這是一個多才多藝的基礎模型,從LLaMA2-7B初始化而來,在一般任務、編程和數學方面表現出色。LLaMA Pro及其遵循指令的對應模型(LLaMA Pro-Instruct)在各種基準測試中取得了先進的表現,顯示出在LLaMA系列現有開放模型中以及作為智能代理進行推理和處理各種任務的巨大潛力。我們的研究結果為整合自然語言和編程語言提供了寶貴見解,為開發能夠在各種環境中有效運作的先進語言代理奠定了堅實基礎。
English
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
PDF543December 15, 2024