デールとランジュバンの出会い：乗法的ノイズ拡散モデル

要旨

勾配降下法は、数多くの機械学習アプリケーションにおいて強力かつ効果的な最適化技術として証明されてきた。近年の計算神経科学の進展により、標準的な勾配降下法の最適化定式化における学習は、生物学的システムにおける学習と一致しないことが示されている。これにより、生物学的にインスパイアされた学習技術を構築するための興味深い道が開かれた。そのようなアプローチの一つは、デールの法則に基づいており、これは抑制性シナプスと興奮性シナプスが学習の過程で役割を交換しないことを述べている。これに基づく指数勾配降下法の最適化スキームは、対数正規分布に従うシナプス重みをもたらす。興味深いことに、幾何ブラウン運動（GBM）に基づく確率微分方程式（SDE）に対応するフォッカー・プランク方程式を満たす密度は、対数正規密度である。この関連性を活用し、幾何ブラウン運動を支配するSDEから出発し、対応する逆時間SDEを離散化すると、乗法的更新規則が得られることが示される。驚くべきことに、この更新規則は、デールの法則に基づく指数勾配降下法の更新のサンプリング等価物と一致する。さらに、非負データに対するHyvaerinenによって提案された損失関数を含む、乗法的ノイズ除去スコアマッチングの新しい形式を提案する。実際、対数正規分布に従うデータは正であり、提案されたスコアマッチング形式は自然に適合する。これにより、画像データに対するスコアベースモデルの学習が可能となり、対数正規密度から始まるサンプル生成のための新しい乗法的更新スキームが得られる。MNIST、Fashion MNIST、およびKuzushijiデータセットにおける実験結果は、この新しいスキームの生成能力を示している。私たちの知る限り、これは幾何ブラウン運動に基づく乗法的更新を採用した生物学的にインスパイアされた生成モデルの最初の例である。

English

Gradient descent has proven to be a powerful and effective technique for optimization in numerous machine learning applications. Recent advances in computational neuroscience have shown that learning in standard gradient descent optimization formulation is not consistent with learning in biological systems. This has opened up interesting avenues for building biologically inspired learning techniques. One such approach is inspired by Dale's law, which states that inhibitory and excitatory synapses do not swap roles during the course of learning. The resulting exponential gradient descent optimization scheme leads to log-normally distributed synaptic weights. Interestingly, the density that satisfies the Fokker-Planck equation corresponding to the stochastic differential equation (SDE) with geometric Brownian motion (GBM) is the log-normal density. Leveraging this connection, we start with the SDE governing geometric Brownian motion, and show that discretizing the corresponding reverse-time SDE yields a multiplicative update rule, which surprisingly, coincides with the sampling equivalent of the exponential gradient descent update founded on Dale's law. Furthermore, we propose a new formalism for multiplicative denoising score-matching, subsuming the loss function proposed by Hyvaerinen for non-negative data. Indeed, log-normally distributed data is positive and the proposed score-matching formalism turns out to be a natural fit. This allows for training of score-based models for image data and results in a novel multiplicative update scheme for sample generation starting from a log-normal density. Experimental results on MNIST, Fashion MNIST, and Kuzushiji datasets demonstrate generative capability of the new scheme. To the best of our knowledge, this is the first instance of a biologically inspired generative model employing multiplicative updates, founded on geometric Brownian motion.

デールとランジュバンの出会い：乗法的ノイズ拡散モデル

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

要旨

Support