扩散模型的基本原理
The Principles of Diffusion Models
October 24, 2025
作者: Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
cs.AI
摘要
本专著系统阐述了扩散模型发展的核心原理,通过追溯其理论渊源,揭示不同数学表述如何源于共同的理论基础。扩散建模首先定义前向过程——将数据逐步扰动为噪声,通过连续的中间分布将数据分布与简单先验分布相连接。其核心目标是学习逆向过程,在恢复相同中间状态的同时将噪声重构为数据。我们阐述了三种互补的视角:受变分自编码器启发的变分视角将扩散视为逐步去噪的学习过程;基于能量模型的得分匹配视角学习数据分布演化的梯度场,指示如何将样本推向高概率区域;与标准化流相关的流形视角将生成过程视为遵循平滑路径,在习得的速度场引导下将噪声样本转化为数据。这些视角共享统一的理论骨架:一个时间相关的速度场,其流变换将简单先验分布传输至数据分布。采样过程即转化为求解微分方程,使噪声沿连续轨迹演化为数据。在此基础上,专著深入探讨了可控生成指导策略、高效数值求解器,以及受扩散启发的流映射模型——该模型可学习任意时间点间的直接映射关系。本书为具备深度学习基础知识的读者提供了扩散模型的概念性框架与数理基础理解。
English
This monograph presents the core principles that have guided the development
of diffusion models, tracing their origins and showing how diverse formulations
arise from shared mathematical ideas. Diffusion modeling starts by defining a
forward process that gradually corrupts data into noise, linking the data
distribution to a simple prior through a continuum of intermediate
distributions. The goal is to learn a reverse process that transforms noise
back into data while recovering the same intermediates. We describe three
complementary views. The variational view, inspired by variational
autoencoders, sees diffusion as learning to remove noise step by step. The
score-based view, rooted in energy-based modeling, learns the gradient of the
evolving data distribution, indicating how to nudge samples toward more likely
regions. The flow-based view, related to normalizing flows, treats generation
as following a smooth path that moves samples from noise to data under a
learned velocity field. These perspectives share a common backbone: a
time-dependent velocity field whose flow transports a simple prior to the data.
Sampling then amounts to solving a differential equation that evolves noise
into data along a continuous trajectory. On this foundation, the monograph
discusses guidance for controllable generation, efficient numerical solvers,
and diffusion-motivated flow-map models that learn direct mappings between
arbitrary times. It provides a conceptual and mathematically grounded
understanding of diffusion models for readers with basic deep-learning
knowledge.