Dale과 Langevin의 만남: 곱셈적 노이즈 제거 확산 모델

초록

경사 하강법은 수많은 기계 학습 응용 분야에서 강력하고 효과적인 최적화 기술로 입증되어 왔습니다. 최근 계산 신경과학의 발전은 표준 경사 하강법 최적화 공식이 생물학적 시스템에서의 학습과 일치하지 않음을 보여주었습니다. 이는 생물학적으로 영감을 받은 학습 기술을 구축하기 위한 흥미로운 길을 열었습니다. 그 중 하나는 학습 과정 동안 억제성 및 흥분성 시냅스가 역할을 바꾸지 않는다는 데일의 법칙(Dale's law)에서 영감을 받은 접근법입니다. 이로 인해 지수적 경사 하강법 최적화 기법이 도출되었으며, 이는 로그 정규 분포를 따르는 시냅스 가중치를 초래합니다. 흥미롭게도, 기하학적 브라운 운동(GBM)을 포함한 확률적 미분 방정식(SDE)에 해당하는 포커-플랑크 방정식을 만족하는 밀도는 로그 정규 밀도입니다. 이러한 연결을 활용하여, 우리는 기하학적 브라운 운동을 지배하는 SDE로부터 시작하고, 해당 역시간 SDE를 이산화하면 곱셈적 업데이트 규칙이 도출됨을 보입니다. 이는 놀랍게도 데일의 법칙에 기반한 지수적 경사 하강법 업데이트의 샘플링 등가물과 일치합니다. 더 나아가, 우리는 비음수 데이터에 대해 Hyvaerinen이 제안한 손실 함수를 포함하는 곱셈적 노이즈 제거 스코어 매칭을 위한 새로운 형식을 제안합니다. 실제로, 로그 정규 분포를 따르는 데이터는 양수이며, 제안된 스코어 매칭 형식은 자연스럽게 적합합니다. 이를 통해 이미지 데이터에 대한 스코어 기반 모델을 훈련할 수 있으며, 로그 정규 밀도에서 시작하는 샘플 생성을 위한 새로운 곱셈적 업데이트 기법을 도출합니다. MNIST, Fashion MNIST, Kuzushiji 데이터셋에 대한 실험 결과는 새로운 기법의 생성 능력을 입증합니다. 우리가 아는 한, 이는 기하학적 브라운 운동에 기반한 곱셈적 업데이트를 사용하는 생물학적으로 영감을 받은 생성 모델의 첫 번째 사례입니다.

English

Gradient descent has proven to be a powerful and effective technique for optimization in numerous machine learning applications. Recent advances in computational neuroscience have shown that learning in standard gradient descent optimization formulation is not consistent with learning in biological systems. This has opened up interesting avenues for building biologically inspired learning techniques. One such approach is inspired by Dale's law, which states that inhibitory and excitatory synapses do not swap roles during the course of learning. The resulting exponential gradient descent optimization scheme leads to log-normally distributed synaptic weights. Interestingly, the density that satisfies the Fokker-Planck equation corresponding to the stochastic differential equation (SDE) with geometric Brownian motion (GBM) is the log-normal density. Leveraging this connection, we start with the SDE governing geometric Brownian motion, and show that discretizing the corresponding reverse-time SDE yields a multiplicative update rule, which surprisingly, coincides with the sampling equivalent of the exponential gradient descent update founded on Dale's law. Furthermore, we propose a new formalism for multiplicative denoising score-matching, subsuming the loss function proposed by Hyvaerinen for non-negative data. Indeed, log-normally distributed data is positive and the proposed score-matching formalism turns out to be a natural fit. This allows for training of score-based models for image data and results in a novel multiplicative update scheme for sample generation starting from a log-normal density. Experimental results on MNIST, Fashion MNIST, and Kuzushiji datasets demonstrate generative capability of the new scheme. To the best of our knowledge, this is the first instance of a biologically inspired generative model employing multiplicative updates, founded on geometric Brownian motion.

Dale과 Langevin의 만남: 곱셈적 노이즈 제거 확산 모델

Dale meets Langevin: A Multiplicative Denoising Diffusion Model

초록

Support