推斷時縮放技術的理念可為生成式預訓練演算法帶來益處

摘要

近年來，通過生成式預訓練，基礎模型取得了顯著進展，然而這一領域的算法創新在很大程度上停滯於離散信號的自回歸模型和連續信號的擴散模型。這種停滯形成了一個瓶頸，阻礙了我們充分釋放豐富多模態數據的潛力，從而限制了多模態智能的發展。我們認為，採用一種「推理優先」的視角，即在推理階段優先考慮序列長度和精煉步驟的擴展效率，能夠激發新的生成式預訓練算法。以歸納矩匹配（IMM）為具體例證，我們展示了如何通過針對性修改來解決擴散模型推理過程中的局限性，從而得到一個穩定的單階段算法，該算法不僅實現了更優的樣本質量，還將推理效率提升了一個數量級以上。

English

Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for continuous signals. This stagnation creates a bottleneck that prevents us from fully unlocking the potential of rich multi-modal data, which in turn limits the progress on multimodal intelligence. We argue that an inference-first perspective, which prioritizes scaling efficiency during inference time across sequence length and refinement steps, can inspire novel generative pre-training algorithms. Using Inductive Moment Matching (IMM) as a concrete example, we demonstrate how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm that achieves superior sample quality with over an order of magnitude greater inference efficiency.

推斷時縮放技術的理念可為生成式預訓練演算法帶來益處

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

摘要

Support