深度增量學習
Deep Delta Learning
January 1, 2026
作者: Yifan Zhang, Yifeng Liu, Mengdi Wang, Quanquan Gu
cs.AI
摘要
深度殘差網絡的有效性根本上建基於恆等捷徑連接。雖然此機制能有效緩解梯度消失問題,但卻對特徵轉換施加了嚴格加性的歸納偏置,從而限制網絡建模複雜狀態轉換的能力。本文提出深度增量學習(DDL),這種新穎架構通過可學習的數據依賴型幾何變換來調製恆等捷徑,從而推廣標準殘差連接。該變換稱為增量算子,構成單位矩陣的秩-1擾動,由反射方向向量k(X)和門控標量β(X)參數化。我們對此算子進行譜分析,證明門控β(X)能實現恆等映射、正交投影與幾何反射之間的動態插值。此外,我們將殘差更新重構為同步秩-1注入,其中門控作為動態步長,同時控制舊資訊的擦除與新特徵的寫入。這種統一架構使網絡能顯式控制其層間轉移算子的譜,在保持門控殘差架構穩定訓練特性的同時,實現對複雜非單調動力學的建模能力。
English
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector k(X) and a gating scalar β(X). We provide a spectral analysis of this operator, demonstrating that the gate β(X) enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.