加重h変換サンプリングによる粗大誘導視覚生成

要旨

粗大誘導による視覚生成は、劣化または低精細度の粗い参照から高精細な視覚サンプルを合成する技術であり、様々な実世界応用において重要である。学習ベースの手法は有効であるが、ペアデータ収集に伴う高い学習コストと一般化性能の制約が本質的な課題となる。このため、近年の学習不要手法では、事前学習済み拡散モデルを活用し、サンプリング過程に誘導を組み込むことが提案されている。しかし、これらの手法は、双三次ダウンサンプリングなどの順方向（高精細から粗大への）変換演算子の知識を必要とするか、誘導と合成品質のバランス調整が困難である。これらの課題に対処するため、本論文では確率過程（例：サンプリング過程）を所定の条件で制約するツールであるh変換を利用した新規な誘導手法を提案する。具体的には、各サンプリング時間ステップにおける遷移確率を、理想的な高精細サンプルへ生成を方向付けるドリフト関数を元の微分方程式に追加することで修正する。回避不可能な近似誤差に対処するため、誤差の増大に伴って当該項の重みを漸減するノイズレベル感知型スケジュールを導入し、誘導の忠実度と高品質合成の両立を確保する。多様な画像・動画生成タスクにおける大規模実験により、本手法の有効性と一般化性能を実証する。

English

Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward (fine-to-coarse) transformation operator, e.g., bicubic downsampling, or are difficult to balance between guidance and synthetic quality. To address these challenges, we propose a novel guided method by using the h-transform, a tool that can constrain stochastic processes (e.g., sampling process) under desired conditions. Specifically, we modify the transition probability at each sampling timestep by adding to the original differential equation with a drift function, which approximately steers the generation toward the ideal fine sample. To address unavoidable approximation errors, we introduce a noise-level-aware schedule that gradually de-weights the term as the error increases, ensuring both guidance adherence and high-quality synthesis. Extensive experiments across diverse image and video generation tasks demonstrate the effectiveness and generalization of our method.

加重h変換サンプリングによる粗大誘導視覚生成

Coarse-Guided Visual Generation via Weighted h-Transform Sampling

要旨

Support