从朗之万视角重新审视扩散模型

摘要

扩散模型常从变分自编码器、得分匹配或流匹配等多重视角引入，辅以密集且技术要求高的数学推导，这对初学者而言往往难以掌握。一个经典问题是：逆向过程如何逆转前向过程，从而从纯噪声中生成数据？本文从新颖的朗之万动力学视角系统梳理扩散模型，给出更简洁清晰且直观的解答。我们同时探讨以下问题：基于常微分方程和随机微分方程的扩散模型如何统一于同一框架？为何扩散模型在理论上优于普通变分自编码器？流匹配为何本质上并不比去噪或得分匹配更简单，而是在最大似然意义下等价？我们论证了朗之万视角能为这些问题提供清晰直接的答案，它不仅弥合了现有扩散模型解释之间的鸿沟，展示了不同表述如何在统一框架内相互转化，更为学习者和资深研究者提供了兼具教学价值与深度直观认知的路径。

English

Diffusion models are often introduced from multiple perspectives, such as VAEs, score matching, or flow matching, accompanied by dense and technically demanding mathematics that can be difficult for beginners to grasp. One classic question is: how does the reverse process invert the forward process to generate data from pure noise? This article systematically organizes the diffusion model from a fresh Langevin perspective, offering a simpler, clearer, and more intuitive answer. We also address the following questions: how can ODE-based and SDE-based diffusion models be unified under a single framework? Why are diffusion models theoretically superior to ordinary VAEs? Why is flow matching not fundamentally simpler than denoising or score matching, but equivalent under maximum-likelihood? We demonstrate that the Langevin perspective offers clear and straightforward answers to these questions, bridging existing interpretations of diffusion models, showing how different formulations can be converted into one another within a common framework, and offering pedagogical value for both learners and experienced researchers seeking deeper intuition.

从朗之万视角重新审视扩散模型

Rethinking the Diffusion Model from a Langevin Perspective

摘要

Support