熱力学的自然勾配降下法

要旨

第二階のトレーニング手法は勾配降下法よりも収束特性が優れているが、計算コストの高さから大規模トレーニングでは実用化されていない。これはデジタルコンピュータによるハードウェア制約と見なすことができる。本論文では、適切なハードウェアを利用することで、第二階手法である自然勾配降下法（NGD）が第一階手法と同程度の計算複雑性で反復計算可能であることを示す。我々は、特定のパラメータ領域においてNGDと等価でありながら、過度に高コストな線形システムの解法を回避する新しいハイブリッドデジタル-アナログアルゴリズムを提案する。本アルゴリズムは、平衡状態にあるアナログシステムの熱力学的性質を利用するため、アナログ熱力学的コンピュータを必要とする。トレーニングはハイブリッドデジタル-アナログループで行われ、勾配とフィッシャー情報行列（または他の正定値曲率行列）が一定時間間隔で計算されながら、アナログダイナミクスが進行する。我々は、分類タスクと言語モデルのファインチューニングタスクにおいて、このアプローチが最先端のデジタル第一階および第二階トレーニング手法を上回ることを数値的に実証する。

English

Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method, when employing appropriate hardware. We present a new hybrid digital-analog algorithm for training neural networks that is equivalent to NGD in a certain parameter regime but avoids prohibitively costly linear system solves. Our algorithm exploits the thermodynamic properties of an analog system at equilibrium, and hence requires an analog thermodynamic computer. The training occurs in a hybrid digital-analog loop, where the gradient and Fisher information matrix (or any other positive semi-definite curvature matrix) are calculated at given time intervals while the analog dynamics take place. We numerically demonstrate the superiority of this approach over state-of-the-art digital first- and second-order training methods on classification tasks and language model fine-tuning tasks.

熱力学的自然勾配降下法

Thermodynamic Natural Gradient Descent

要旨

Support