Ψ-采样器:基于序贯蒙特卡罗(SMC)推理的分数模型奖励对齐初始粒子采样
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
June 2, 2025
作者: Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung
cs.AI
摘要
我们提出了Psi-Sampler,这是一个基于序列蒙特卡洛(SMC)的框架,结合了预条件Crank-Nicolson Langevin(pCNL)初始粒子采样方法,旨在实现与基于分数的生成模型在推理阶段的有效奖励对齐。随着从预训练到后训练优化的广泛范式转变,基于分数的生成模型在推理阶段的奖励对齐最近获得了显著关注。这一趋势的核心是将序列蒙特卡洛方法应用于去噪过程。然而,现有方法通常从高斯先验初始化粒子,这未能充分捕捉与奖励相关的区域,导致采样效率降低。我们证明,从奖励感知的后验分布初始化粒子能显著提升对齐性能。为了在高维潜在空间中进行后验采样,我们引入了预条件Crank-Nicolson Langevin(pCNL)算法,该算法结合了维度鲁棒的提议机制与梯度引导的动态过程。这一方法实现了高效且可扩展的后验采样,并在多种奖励对齐任务中持续提升性能,包括布局到图像生成、数量感知生成和审美偏好生成,如我们的实验所展示。
English
We introduce Psi-Sampler, an SMC-based framework incorporating pCNL-based
initial particle sampling for effective inference-time reward alignment with a
score-based generative model. Inference-time reward alignment with score-based
generative models has recently gained significant traction, following a broader
paradigm shift from pre-training to post-training optimization. At the core of
this trend is the application of Sequential Monte Carlo (SMC) to the denoising
process. However, existing methods typically initialize particles from the
Gaussian prior, which inadequately captures reward-relevant regions and results
in reduced sampling efficiency. We demonstrate that initializing from the
reward-aware posterior significantly improves alignment performance. To enable
posterior sampling in high-dimensional latent spaces, we introduce the
preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines
dimension-robust proposals with gradient-informed dynamics. This approach
enables efficient and scalable posterior sampling and consistently improves
performance across various reward alignment tasks, including layout-to-image
generation, quantity-aware generation, and aesthetic-preference generation, as
demonstrated in our experiments.