深度學習的哈密頓-雅可比理論

摘要

本文將神經網路的訓練精確地識別為對Hamilton-Jacobi初值問題的搜索：每個梯度步選擇一個黏性Hamilton-Jacobi方程的初始數據，其Hopf-Cole傳播子最能擬合觀測值；在推理時，輸入是評估該解所在的空間點，而初始條件已編碼在權重中。此對應關係對於對數-和-指數層是精確的，而對於更廣泛的架構（殘差網路、變壓器、遞迴架構（RNN、LSTM、SSM））則是結構性的：它們各自離散化同一類Hamilton-Jacobi方程，但具有依賴架構的哈密頓量和黏性。單一形變參數ε將所有四個觀點（網路、熱帶代數、黏性偏微分方程、凸優化）統一在一個在Lipschitz條件下封閉的交換圖中。定量後果包括：固定t時達到最小最大最優泛化速率O(n^{-1/(d+2)})；由ε控制的對抗魯棒性；反向傳播作為殘差網路哈密頓系統的共態方程（龐特里亞金最大值原理）；通過偏微分方程求積得到的與數據本徵維度一致的標度指數；以及一個閉形式O(N)影響函數（softmax歸因權重π_j），其熵景觀在ε增加時經歷折疊分岔，每個分岔合併歸因盆地。

English

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.