ChatPaper.aiChatPaper

AdaPreLoRA:Adafactor预条件低秩自适应

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

May 9, 2026
作者: Ziyun Liu, Fengmiao Bian, Jian-Feng Cai
cs.AI

摘要

低秩适应(LoRA)将权重更新重新参数化为两个低秩因子的乘积,但生成器将因子映射到权重矩阵的雅可比矩阵 \(J_{G}\) 是秩亏缺的,因此由任意 \(W\) 空间预条件器 \(F_t\) 诱导出的因子空间预条件器 \(J_{G}^* {F}_t J_{G}\) 是奇异的,这使得标准链式法则无法唯一反转,从而无法将预条件后的 \(W\) 空间方向映射回因子空间的更新。我们将现有的 LoRA 优化器统一在一个由两个选择参数化的框架中:(i)使用哪种 \(J_{G}^* {F}_t J_{G}\) 的可逆代理,(ii)对 \(W\) 使用哪种 \(F_t\)。现有方法沿这些轴分为四类:因子空间自适应更新、\(J_{G}^* J_{G}\) 的块对角代理、弗罗贝尼乌斯残差伪逆方法以及黎曼流形约束。在这一设计空间中,一种梯度统计感知的 \(F_t\) 搭配闭式因子空间求解(内存为 \(O((m+n)r)\))仍未被充分探索。我们提出了 AdaPreLoRA,它填补了这一空白:采用 Adafactor 对角克罗内克预条件器 \(H_t\) 作用于 \(W\),并从由此产生的因子空间求解族中,选择在 \(H_t\) 加权范数下最小化两因子贡献之间不平衡的元素;通过构造,所得因子更新是在 \(H_t\) 加权范数下最接近预条件后 \(W\) 空间方向的 LoRA 近似。在 GPT-2(E2E)、Mistral-7B 和 Qwen2-7B(GLUE、ARC、GSM8K)以及扩散模型个性化任务上,AdaPreLoRA 与一组有代表性的 LoRA 优化器相比具有竞争力或更优性能,同时将峰值 GPU 内存保持在 LoRA 优化器级别。
English
Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.
PDF21May 14, 2026