ChatPaper.aiChatPaper

深度增量学习

Deep Delta Learning

January 1, 2026
作者: Yifan Zhang, Yifeng Liu, Mengdi Wang, Quanquan Gu
cs.AI

摘要

深度残差网络的有效性从根本上依赖于恒等快捷连接机制。虽然该机制能有效缓解梯度消失问题,但其为特征变换施加了严格的加性归纳偏置,从而限制了网络建模复杂状态转移的能力。本文提出深度增量学习(DDL)这一新型架构,通过采用可学习的、数据依赖的几何变换对恒等快捷连接进行调制,从而推广了标准残差连接。这种被称为增量算子的变换构成了单位矩阵的秩-1扰动,由反射方向向量k(X)和门控标量β(X)共同参数化。我们对该算子进行了谱分析,证明门控值β(X)能够实现恒等映射、正交投影与几何反射之间的动态插值。进一步地,我们将残差更新重构为同步秩-1注入,其中门控值作为动态步长同时控制旧信息的擦除与新特征的写入。这种统一设计使网络能够显式控制其层间转移算子的谱分布,在保持门控残差架构稳定训练特性的同时,实现对复杂非单调动态的建模。
English
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector k(X) and a gating scalar β(X). We provide a spectral analysis of this operator, demonstrating that the gate β(X) enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.
PDF181January 6, 2026