ChatPaper.aiChatPaper

MoRA:高階更新以進行參數高效微調

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

May 20, 2024
作者: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
cs.AI

摘要

低秩適應是大型語言模型的一種受歡迎的參數高效微調方法。在本文中,我們分析了 LoRA 中實現的低秩更新的影響。我們的研究結果表明,低秩更新機制可能會限制大型語言模型有效學習和記憶新知識的能力。受到這一觀察的啟發,我們提出了一種名為 MoRA 的新方法,該方法利用方陣實現高秩更新,同時保持相同數量的可訓練參數。為實現這一目標,我們引入了相應的非參數操作符,以減少輸入維度並增加方陣的輸出維度。此外,這些操作符確保權重可以合併回大型語言模型,使得我們的方法可以像 LoRA 一樣部署。我們對我們的方法在五個任務上進行了全面評估:指導微調、數學推理、持續預訓練、記憶和預訓練。我們的方法在記憶密集型任務上優於 LoRA,並在其他任務上取得可比的性能。
English
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.

Summary

AI-Generated Summary

PDF5110December 15, 2024