Ψ-Sampler: スコアモデルにおけるSMCベースの推論時報酬アラインメントのための初期粒子サンプリング

要旨

Psi-Samplerを紹介します。これは、効果的な推論時の報酬整合を実現するため、pCNLベースの初期粒子サンプリングを組み込んだSMCベースのフレームワークです。スコアベース生成モデルを用いた推論時の報酬整合は、事前学習から事後学習最適化へのパラダイムシフトに伴い、最近注目を集めています。このトレンドの中心にあるのは、Sequential Monte Carlo（SMC）をノイズ除去プロセスに適用する手法です。しかし、既存の手法では通常、ガウシアン事前分布から粒子を初期化しており、報酬に関連する領域を十分に捉えられず、サンプリング効率が低下する問題がありました。我々は、報酬を考慮した事後分布から初期化することで、整合性能が大幅に向上することを実証しました。高次元潜在空間での事後サンプリングを可能にするため、次元ロバストな提案と勾配情報を活用したダイナミクスを組み合わせたpreconditioned Crank-Nicolson Langevin（pCNL）アルゴリズムを導入しました。このアプローチにより、効率的でスケーラブルな事後サンプリングが可能となり、レイアウトから画像生成、数量認識生成、美的選好生成など、様々な報酬整合タスクにおいて一貫して性能が向上することが、実験により示されました。

English

We introduce Psi-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments.

Ψ-Sampler: スコアモデルにおけるSMCベースの推論時報酬アラインメントのための初期粒子サンプリング

Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

要旨

Support