戴尔遇见朗之万：一种乘性去噪扩散模型

摘要

梯度下降法已被证明是众多机器学习应用中一种强大且有效的优化技术。然而，计算神经科学领域的最新进展表明，标准梯度下降优化公式中的学习过程与生物系统中的学习并不一致。这一发现为构建受生物学启发的学习技术开辟了有趣的新途径。其中一种方法便是受戴尔定律（Dale's law）启发，该定律指出在学习过程中，抑制性和兴奋性突触不会互换角色。由此产生的指数梯度下降优化方案导致了突触权重呈对数正态分布。有趣的是，满足与几何布朗运动（GBM）对应的随机微分方程（SDE）的福克-普朗克方程的密度正是对数正态密度。利用这一联系，我们从控制几何布朗运动的SDE出发，证明了离散化相应的反向时间SDE会产生一个乘法更新规则，令人惊讶的是，这一规则与基于戴尔定律的指数梯度下降更新的采样等价形式相吻合。此外，我们提出了一种新的乘法去噪分数匹配形式，涵盖了Hyvaerinen为非负数据提出的损失函数。实际上，对数正态分布的数据是正数，而所提出的分数匹配形式自然契合这一特性。这使得能够训练基于分数的图像数据模型，并产生了一种从对数正态密度开始样本生成的新颖乘法更新方案。在MNIST、Fashion MNIST和Kuzushiji数据集上的实验结果展示了新方案的生成能力。据我们所知，这是首个基于几何布朗运动、采用乘法更新的受生物学启发生成模型实例。

English

Gradient descent has proven to be a powerful and effective technique for optimization in numerous machine learning applications. Recent advances in computational neuroscience have shown that learning in standard gradient descent optimization formulation is not consistent with learning in biological systems. This has opened up interesting avenues for building biologically inspired learning techniques. One such approach is inspired by Dale's law, which states that inhibitory and excitatory synapses do not swap roles during the course of learning. The resulting exponential gradient descent optimization scheme leads to log-normally distributed synaptic weights. Interestingly, the density that satisfies the Fokker-Planck equation corresponding to the stochastic differential equation (SDE) with geometric Brownian motion (GBM) is the log-normal density. Leveraging this connection, we start with the SDE governing geometric Brownian motion, and show that discretizing the corresponding reverse-time SDE yields a multiplicative update rule, which surprisingly, coincides with the sampling equivalent of the exponential gradient descent update founded on Dale's law. Furthermore, we propose a new formalism for multiplicative denoising score-matching, subsuming the loss function proposed by Hyvaerinen for non-negative data. Indeed, log-normally distributed data is positive and the proposed score-matching formalism turns out to be a natural fit. This allows for training of score-based models for image data and results in a novel multiplicative update scheme for sample generation starting from a log-normal density. Experimental results on MNIST, Fashion MNIST, and Kuzushiji datasets demonstrate generative capability of the new scheme. To the best of our knowledge, this is the first instance of a biologically inspired generative model employing multiplicative updates, founded on geometric Brownian motion.

戴尔遇见朗之万：一种乘性去噪扩散模型

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

摘要

Support