CosineGate:残差网络中基于余弦不兼容性的语义动态路由
CosineGate: Semantic Dynamic Routing via Cosine Incompatibility in Residual Networks
December 21, 2025
作者: Yogeswar Reddy Thota
cs.AI
摘要
现代深度残差网络对每个输入都评估所有残差块,即使在恒等映射足够时也执行大量冗余计算。我们提出了CosineGate,一种用于残差网络动态路由的端到端可微分架构,它利用恒等映射与残差特征表示之间的余弦不兼容性作为自监督跳跃信号。CosineGate通过余弦不兼容比(CIR,定义为1 - cos(x, F(x)))衡量语义冗余,并采用Gumbel-Softmax松弛技术实现在训练过程中对每个样本、每个块的门控控制。渐进式FLOPs正则化项在不破坏优化稳定性的前提下控制平均计算量。在CIFAR-10数据集上,CosineGate覆盖了精度-效率帕累托前沿:激进配置以24.1%的FLOPs节省达到89.9%的准确率;平衡配置在160个epoch时以28.5%的节省达到91.3%的准确率;保守配置在计算量极少降低的情况下达到93.2%的准确率峰值。这些结果匹配或超越了ResNet-20(91.3%),同时减少了计算量,且无需辅助监督、知识蒸馏或任务特定启发式方法。我们的结果表明,简单的特征不兼容性几何度量能够为动态残差路由提供一种原理清晰且有效的信号。
English
Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an end-to-end differentiable architecture for dynamic routing in residual networks that uses cosine incompatibility between identity and residual feature representations as a self-supervised skip signal. CosineGate measures semantic redundancy through the Cosine Incompatibility Ratio (CIR), defined as 1 - cos(x, F(x)), and uses Gumbel-Softmax relaxation to enable per-sample, per-block gating during training. A progressive FLOPs regularization term controls average compute usage without destabilizing optimization. On CIFAR-10, CosineGate spans the accuracy-efficiency Pareto frontier: an aggressive configuration achieves 89.9 percent accuracy with 24.1 percent FLOPs savings, a balanced configuration achieves 91.3 percent accuracy with 28.5 percent savings at epoch 160, and a conservative configuration reaches a peak of 93.2 percent accuracy with minimal compute reduction. These results match or exceed ResNet-20 (91.3 percent) while reducing computation, without auxiliary supervision, distillation, or task-specific heuristics. Our results demonstrate that simple geometric measures of feature incompatibility provide a principled and effective signal for dynamic residual routing.