ChatPaper.aiChatPaper

学习率至关重要:普通LoRA或足以胜任大语言模型微调

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

February 4, 2026
作者: Yu-Ang Lee, Ching-Yun Ko, Pin-Yu Chen, Mi-Yen Yeh
cs.AI

摘要

低秩自适应(LoRA)是当前高效微调大语言模型的主流方法。基于此范式,近期研究提出了多种初始化策略和架构改进方案,据称较原始LoRA实现了显著提升。然而这些改进成果往往在固定或有限调参的设置下验证,尽管神经网络对训练配置的敏感性已是公认事实。本研究通过大规模超参数搜索,系统性地重新评估了四种代表性LoRA变体与原始LoRA的性能表现。在涵盖数学推理与代码生成的多尺度模型实验中,我们发现不同LoRA方法适配各异的学习率区间。关键在于,当学习率经过恰当调优后,所有方法均能达到相近的峰值性能(差异维持在1-2%以内),仅表现出细微的秩依赖特性。这些结果表明原始LoRA仍是具有竞争力的基准方法,而在单一训练配置下报告的改进可能无法反映方法论上的持续优势。最终,二阶分析将不同最优学习率区间的成因归结于最大赫森矩阵特征值的变化,这与经典学习理论相吻合。
English
Low-Rank Adaptation (LoRA) is the prevailing approach for efficient large language model (LLM) fine-tuning. Building on this paradigm, recent studies have proposed alternative initialization strategies and architectural modifications, reporting substantial improvements over vanilla LoRA. However, these gains are often demonstrated under fixed or narrowly tuned hyperparameter settings, despite the known sensitivity of neural networks to training configurations. In this work, we systematically re-evaluate four representative LoRA variants alongside vanilla LoRA through extensive hyperparameter searches. Across mathematical and code generation tasks on diverse model scales, we find that different LoRA methods favor distinct learning rate ranges. Crucially, once learning rates are properly tuned, all methods achieve similar peak performance (within 1-2%), with only subtle rank-dependent behaviors. These results suggest that vanilla LoRA remains a competitive baseline and that improvements reported under single training configuration may not reflect consistent methodological advantages. Finally, a second-order analysis attributes the differing optimal learning rate ranges to variations in the largest Hessian eigenvalue, aligning with classical learning theories.
PDF22February 7, 2026