ChatPaper.aiChatPaper

深度学习的哈密顿-雅可比理论

The Hamilton-Jacobi Theory of Deep Learning

May 27, 2026
作者: Jose Marie Antonio Miñoza, Erika Fille T. Legara, Christopher P. Monterola
cs.AI

摘要

本文准确地将神经网络的训练识别为哈密顿-雅可比初值问题上的搜索:每个梯度步选择粘性哈密顿-雅可比方程的初始数据,其霍普夫-科尔传播子最拟合观测值;在推理时,输入是该解被评估的空间点,且初始条件已编码于权重中。该对应关系对对数-求和-指数层是精确的,对更广泛架构(残差网络、变换器、循环架构如RNN、LSTM、SSM)则是结构性的——它们离散化同一类哈密顿-雅可比方程,仅哈密顿量和粘性因架构而异。单个形变参数ε将网络、热带代数、粘性偏微分方程、凸优化四种视角统一于一个满足Lipschitz条件的交换图中。定量结果包括:固定t时的极小化最优泛化速率O(n^{-1/(d+2)});由ε控制的对抗鲁棒性;残差网络中反向传播等同于哈密顿系统的协态方程(庞特里亚金最大值原理);通过偏微分方程求积得到与数据本征维度一致的标度指数;以及闭式O(N)影响函数(softmax归因权重π_j),其熵景观随ε增加经历折叠分岔,每次合并归因盆地。
English
In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.