基于流映射的扩散模型测试时尺度缩放

摘要

为在测试时提升扩散模型的性能，使生成样本在用户指定奖励函数下获得高分，常见方法是在扩散动力学中引入奖励函数的梯度。但该操作通常存在不适定性问题，因为用户定义的奖励函数往往仅在生成过程末端的数据分布上才有明确定义。尽管现有解决方案多采用去噪器预估样本的生成终点状态，我们提出一种基于流映射的简洁替代方案。通过利用流映射与主导瞬时传输的速度场之间的数学关系，我们构建了流映射轨迹倾斜算法（FMTT），该算法在理论证明上能比传统依赖奖励梯度的测试时方法实现更优的奖励提升效果。该方法既可通过重要性加权进行精确采样，也能执行原则性搜索以定位奖励倾斜分布的局部极值点。通过与其他前瞻性技术的对比实验，我们验证了本方法的有效性，并展示了流映射如何助力复杂奖励函数的应用——例如通过与视觉语言模型交互，实现新型图像编辑功能。

English

A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

基于流映射的扩散模型测试时尺度缩放

Test-time scaling of diffusions with flow maps

摘要

Support