推論時のスケーリング手法は生成事前学習アルゴリズムに有益である可能性がある

要旨

近年、生成的な事前学習を通じて基盤モデルが大きく進展しているが、この分野におけるアルゴリズムの革新は、離散信号のための自己回帰モデルと連続信号のための拡散モデルにほぼ停滞している。この停滞は、豊富なマルチモーダルデータの可能性を十分に引き出すことを妨げるボトルネックを生み出し、結果としてマルチモーダル知能の進歩を制限している。我々は、推論時のシーケンス長と精緻化ステップにわたるスケーリング効率を優先する「推論第一」の視点が、新しい生成的な事前学習アルゴリズムを生み出すきっかけとなり得ると主張する。帰納的モーメントマッチング（IMM）を具体例として、拡散モデルの推論プロセスの限界を特定の修正によって解決することで、安定した単一段階のアルゴリズムが得られ、推論効率を一桁以上向上させながら優れたサンプル品質を達成することを実証する。

English

Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for continuous signals. This stagnation creates a bottleneck that prevents us from fully unlocking the potential of rich multi-modal data, which in turn limits the progress on multimodal intelligence. We argue that an inference-first perspective, which prioritizes scaling efficiency during inference time across sequence length and refinement steps, can inspire novel generative pre-training algorithms. Using Inductive Moment Matching (IMM) as a concrete example, we demonstrate how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm that achieves superior sample quality with over an order of magnitude greater inference efficiency.

推論時のスケーリング手法は生成事前学習アルゴリズムに有益である可能性がある

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

要旨

Support