ChatPaper.aiChatPaper

深度學習的哈密頓-雅可比理論

The Hamilton-Jacobi Theory of Deep Learning

May 27, 2026
作者: Jose Marie Antonio Miñoza, Erika Fille T. Legara, Christopher P. Monterola
cs.AI

摘要

本文將神經網路的訓練精確地識別為對Hamilton-Jacobi初值問題的搜索:每個梯度步選擇一個黏性Hamilton-Jacobi方程的初始數據,其Hopf-Cole傳播子最能擬合觀測值;在推理時,輸入是評估該解所在的空間點,而初始條件已編碼在權重中。此對應關係對於對數-和-指數層是精確的,而對於更廣泛的架構(殘差網路、變壓器、遞迴架構(RNN、LSTM、SSM))則是結構性的:它們各自離散化同一類Hamilton-Jacobi方程,但具有依賴架構的哈密頓量和黏性。單一形變參數ε將所有四個觀點(網路、熱帶代數、黏性偏微分方程、凸優化)統一在一個在Lipschitz條件下封閉的交換圖中。定量後果包括:固定t時達到最小最大最優泛化速率O(n^{-1/(d+2)});由ε控制的對抗魯棒性;反向傳播作為殘差網路哈密頓系統的共態方程(龐特里亞金最大值原理);通過偏微分方程求積得到的與數據本徵維度一致的標度指數;以及一個閉形式O(N)影響函數(softmax歸因權重π_j),其熵景觀在ε增加時經歷折疊分岔,每個分岔合併歸因盆地。
English
In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.