ChatPaper.aiChatPaper

擴散模型的基本原理

The Principles of Diffusion Models

October 24, 2025
作者: Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, Stefano Ermon
cs.AI

摘要

本專著闡述了驅動擴散模型發展的核心原理,追溯其理論淵源,並揭示不同數學表述如何源自共通的數學思想。擴散建模首先定義一個前向過程,將數據逐步擾動為噪聲,通過連續的中間分佈將數據分佈與簡單先驗分佈相聯繫。其目標是學習一個逆向過程,在恢復相同中間狀態的同時將噪聲轉化回數據。我們闡述了三種互補的視角:受變分自編碼器啟發的變分視角將擴散視為逐步學習去噪的過程;基於能量模型的得分匹配視角學習演變數據分佈的梯度,指示如何將樣本推向更高概率區域;與歸一化流相關的流視角將生成過程視為沿學習速度場從噪聲到數據的平滑路徑追蹤。這些視角共享共同框架:一個時間依賴的速度場,其流傳輸將簡單先驗轉化為數據。採樣即轉化為求解沿連續軌跡將噪聲演化為數據的微分方程。在此基礎上,專著探討了可控生成的引導技術、高效數值求解器,以及受擴散啟發的流映射模型——該模型可學習任意時間點間的直接映射關係。本書為具備深度學習基礎知識的讀者,提供對擴散模型概念性與數學基礎兼具的系統理解。
English
This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the monograph discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.
PDF583December 2, 2025