MoRA:高级参数更新用于高效微调
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
May 20, 2024
作者: Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
cs.AI
摘要
低秩适应是大型语言模型的一种流行的参数高效微调方法。在本文中,我们分析了低秩更新的影响,如LoRA中所实现的。我们的研究发现表明,低秩更新机制可能会限制LLM有效学习和记忆新知识的能力。受到这一观察的启发,我们提出了一种名为MoRA的新方法,它利用一个方阵来实现高秩更新,同时保持相同数量的可训练参数。为了实现这一点,我们引入了相应的非参数操作符,以减少方阵的输入维度并增加输出维度。此外,这些操作符确保权重可以合并回LLM中,使得我们的方法可以像LoRA一样部署。我们在五个任务上对我们的方法进行了全面评估:指令微调、数学推理、持续预训练、记忆和预训练。我们的方法在对内存密集型任务上表现优于LoRA,并在其他任务上取得了可比较的性能。
English
Low-rank adaptation is a popular parameter-efficient fine-tuning method for
large language models. In this paper, we analyze the impact of low-rank
updating, as implemented in LoRA. Our findings suggest that the low-rank
updating mechanism may limit the ability of LLMs to effectively learn and
memorize new knowledge. Inspired by this observation, we propose a new method
called MoRA, which employs a square matrix to achieve high-rank updating while
maintaining the same number of trainable parameters. To achieve it, we
introduce the corresponding non-parameter operators to reduce the input
dimension and increase the output dimension for the square matrix. Furthermore,
these operators ensure that the weight can be merged back into LLMs, which
makes our method can be deployed like LoRA. We perform a comprehensive
evaluation of our method across five tasks: instruction tuning, mathematical
reasoning, continual pretraining, memory and pretraining. Our method
outperforms LoRA on memory-intensive tasks and achieves comparable performance
on other tasks.Summary
AI-Generated Summary