LLM 增強 LLM:透過組合擴展功能
LLM Augmented LLMs: Expanding Capabilities through Composition
January 4, 2024
作者: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar
cs.AI
摘要
具備數十億參數的基礎模型,經過大量資料訓練後,在各種領域展現出非平凡的技能。然而,由於它們的單一結構,要對其進行擴充或賦予新技能具有挑戰性且成本高昂。另一方面,由於它們的適應能力,有幾個這些模型的新實例正在被訓練用於新的領域和任務。在這項研究中,我們探討現有基礎模型與更具體模型的有效且實用的組合問題,以實現新的功能。為此,我們提出了CALM -- Composition to Augment Language Models -- 介紹了模型之間的交叉注意力,以組合它們的表示並啟用新功能。CALM的顯著特點包括:(i) 通過“重用”現有LLM以及一些額外的參數和數據,在新任務上擴展LLMs的規模,(ii) 保持現有模型權重不變,因此保留現有功能,以及(iii) 適用於不同的領域和設置。我們說明了將PaLM2-S與在低資源語言上訓練的較小模型進行擴充,對於像是翻譯成英語和低資源語言的算術推理等任務,結果顯示絕對改進高達13%。同樣地,當PaLM2-S與特定於代碼的模型進行擴充時,我們看到在代碼生成和解釋任務上相對改進了40%,與完全微調的對應模型不相上下。
English
Foundational models with billions of parameters which have been trained on
large corpora of data have demonstrated non-trivial skills in a variety of
domains. However, due to their monolithic structure, it is challenging and
expensive to augment them or impart new skills. On the other hand, due to their
adaptation abilities, several new instances of these models are being trained
towards new domains and tasks. In this work, we study the problem of efficient
and practical composition of existing foundation models with more specific
models to enable newer capabilities. To this end, we propose CALM --
Composition to Augment Language Models -- which introduces cross-attention
between models to compose their representations and enable new capabilities.
Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using'
existing LLMs along with a few additional parameters and data, (ii) Existing
model weights are kept intact, and hence preserves existing capabilities, and
(iii) Applies to diverse domains and settings. We illustrate that augmenting
PaLM2-S with a smaller model trained on low-resource languages results in an
absolute improvement of up to 13\% on tasks like translation into English and
arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is
augmented with a code-specific model, we see a relative improvement of 40\%
over the base model for code generation and explanation tasks -- on-par with
fully fine-tuned counterparts.