从朗之万动力学视角重新审视扩散模型

摘要

扩散模型常从变分自编码器、分数匹配或流匹配等多角度引入，并伴随密集且技术性强的数学推导，令初学者难以掌握。一个经典问题是：反向过程如何逆转前向过程，从纯噪声中生成数据？本文从新颖的朗之万动力学视角系统梳理扩散模型，给出更简洁清晰且直观的解答。我们同时回应以下问题：基于常微分方程和随机微分方程的扩散模型如何统一于同一框架？为何扩散模型理论上优于普通变分自编码器？为何流匹配本质上并不比去噪或分数匹配更简单，而是在最大似然意义下等价？我们论证朗之万视角能为这些问题提供清晰直接的答案，弥合现有扩散模型解释的鸿沟，展示不同表述如何在统一框架内相互转化，为学习者和资深研究者提供兼具教学价值与深层洞见的理解路径。

English

Diffusion models are often introduced from multiple perspectives, such as VAEs, score matching, or flow matching, accompanied by dense and technically demanding mathematics that can be difficult for beginners to grasp. One classic question is: how does the reverse process invert the forward process to generate data from pure noise? This article systematically organizes the diffusion model from a fresh Langevin perspective, offering a simpler, clearer, and more intuitive answer. We also address the following questions: how can ODE-based and SDE-based diffusion models be unified under a single framework? Why are diffusion models theoretically superior to ordinary VAEs? Why is flow matching not fundamentally simpler than denoising or score matching, but equivalent under maximum-likelihood? We demonstrate that the Langevin perspective offers clear and straightforward answers to these questions, bridging existing interpretations of diffusion models, showing how different formulations can be converted into one another within a common framework, and offering pedagogical value for both learners and experienced researchers seeking deeper intuition.