ChatPaper.aiChatPaper

Ψ-采样器:基于序贯蒙特卡罗(SMC)推理的分数模型奖励对齐初始粒子采样

Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

June 2, 2025
作者: Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung
cs.AI

摘要

我们提出了Psi-Sampler,这是一个基于序列蒙特卡洛(SMC)的框架,结合了预条件Crank-Nicolson Langevin(pCNL)初始粒子采样方法,旨在实现与基于分数的生成模型在推理阶段的有效奖励对齐。随着从预训练到后训练优化的广泛范式转变,基于分数的生成模型在推理阶段的奖励对齐最近获得了显著关注。这一趋势的核心是将序列蒙特卡洛方法应用于去噪过程。然而,现有方法通常从高斯先验初始化粒子,这未能充分捕捉与奖励相关的区域,导致采样效率降低。我们证明,从奖励感知的后验分布初始化粒子能显著提升对齐性能。为了在高维潜在空间中进行后验采样,我们引入了预条件Crank-Nicolson Langevin(pCNL)算法,该算法结合了维度鲁棒的提议机制与梯度引导的动态过程。这一方法实现了高效且可扩展的后验采样,并在多种奖励对齐任务中持续提升性能,包括布局到图像生成、数量感知生成和审美偏好生成,如我们的实验所展示。
English
We introduce Psi-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments.
PDF162June 5, 2025