AdaPreLoRA：Adafactor事前条件付き低ランク適応

要旨

Low-Rank Adaptation（LoRA）は、重み更新を二つの低ランク因子の積として再パラメータ化するが、因子を重み行列に写像する生成器のヤコビアン J_{G} はランク不足であるため、任意の {W} 空間プレコンディショナー {F}_t によって誘導される因子空間プレコンディショナー J_{G}^* {F}_t J_{G} は特異となり、その結果、標準的な連鎖律を一意に逆変換して、プレコンディショニングされた {W} 空間の方向を因子空間の更新に戻すことはできない。本稿では、既存のLoRA最適化手法を、以下の二つの選択によってパラメータ化される統一的な枠組みに位置づける：（i）J_{G}^* {F}_t J_{G} に対してどの可逆な代理関数を用いるか、（ii）{W} 上でどの {F}_t を用いるか。既存手法は、これらの軸に沿って四つのファミリーに分類される：因子空間適応型更新、J_{G}^* J_{G} のブロック対角代理関数、Frobenius残差擬似逆法、およびリーマン多様体制約である。この設計空間において、勾配統計量を考慮した {F}_t と、{O}((m+n)r) メモリで閉形式の因子空間解法を組み合わせた手法は未だ十分に探求されていない。本稿では、このギャップを埋めるAdaPreLoRAを提案する。これは、Adafactorの対角Kroneckerプレコンディショナー {H}_t を {W} に採用し、結果として得られる因子空間解のファミリーから、二つの因子寄与間の {H}_t 重み付き不均衡を最小化する要素を選択するものである。構成上、得られる因子更新は、{H}_t 重み付きノルムの下で、プレコンディショニングされた {W} 空間方向への最良のLoRA近似となる。GPT-2（E2E）、Mistral-7B、Qwen2-7B（GLUE、ARC、GSM8K）、および拡散モデルのパーソナライゼーションにおいて、AdaPreLoRAは代表的なLoRA最適化手法群と競合またはそれを上回り、ピークGPUメモリをLoRA最適化手法レベルに維持する。

English

Low-Rank Adaptation (LoRA) reparameterizes a weight update as a product of two low-rank factors, but the Jacobian J_{G} of the generator mapping the factors to the weight matrix is rank-deficient, so the factor-space preconditioner J_{G}^* {F}_t J_{G} induced by any {W}-space preconditioner {F}_t is singular, and consequently the standard chain rule cannot be uniquely inverted to map a preconditioned {W}-space direction back to a factor-space update. We cast existing LoRA optimizers in a unified framework parameterized by two choices: (i) which invertible surrogate for J_{G}^* {F}_t J_{G} to use, and (ii) which {F}_t on {W} to use. Existing methods occupy four families along these axes: factor-space adaptive updates, block-diagonal surrogates for J_{G}^* J_{G}, Frobenius-residual pseudoinverse methods, and Riemannian manifold constraint. Within this design space, a gradient-statistics-aware {F}_t paired with a closed-form factor-space solve at {O}((m+n)r) memory remains underexplored. We propose AdaPreLoRA, which fills this gap by adopting the Adafactor diagonal Kronecker preconditioner {H}_t on {W} and selecting from the resulting factor-space solution family the element minimizing an {H}_t-weighted imbalance between the two factor contributions; by construction, the resulting factor update is the closest LoRA approximation to the preconditioned {W}-space direction under the {H}_t-weighted norm. Across GPT-2 (E2E), Mistral-7B and Qwen2-7B (GLUE, ARC, GSM8K), and diffusion-model personalization, AdaPreLoRA is competitive with or improves over a representative set of LoRA optimizers while keeping peak GPU memory at the LoRA optimizer level.

AdaPreLoRA：Adafactor事前条件付き低ランク適応

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

要旨

Support