獎勵機制足以實現快速照片級真實感的文本到圖像生成

摘要

對齊生成圖像與複雜文本提示及人類偏好，是人工智慧生成內容（AIGC）中的核心挑戰。隨著獎勵增強擴散蒸餾作為一種提升文本到圖像模型可控性和保真度的有前景方法出現，我們識別出一種根本性的範式轉變：隨著條件變得更加具體且獎勵信號更強，獎勵本身成為生成過程中的主導力量。相比之下，擴散損失則作為一種過於昂貴的正則化形式。為徹底驗證我們的假設，我們引入了R0，這是一種通過正則化獎勵最大化實現的新型條件生成方法。R0不再依賴於棘手的擴散蒸餾損失，而是提出了一種新視角，將圖像生成視為數據空間中的優化問題，旨在搜尋具有高組合獎勵的有效圖像。通過生成器參數化的創新設計和適當的正則化技術，我們大規模訓練了基於R0的頂尖少步文本到圖像生成模型。我們的結果挑戰了擴散後訓練和條件生成的傳統智慧，展示了在複雜條件下獎勵的主導作用。我們希望這些發現能促進AIGC領域內以人為本和以獎勵為中心的生成範式的進一步研究。代碼可在https://github.com/Luo-Yihong/R0獲取。

English

Aligning generated images to complicated text prompts and human preferences is a central challenge in Artificial Intelligence-Generated Content (AIGC). With reward-enhanced diffusion distillation emerging as a promising approach that boosts controllability and fidelity of text-to-image models, we identify a fundamental paradigm shift: as conditions become more specific and reward signals stronger, the rewards themselves become the dominant force in generation. In contrast, the diffusion losses serve as an overly expensive form of regularization. To thoroughly validate our hypothesis, we introduce R0, a novel conditional generation approach via regularized reward maximization. Instead of relying on tricky diffusion distillation losses, R0 proposes a new perspective that treats image generations as an optimization problem in data space which aims to search for valid images that have high compositional rewards. By innovative designs of the generator parameterization and proper regularization techniques, we train state-of-the-art few-step text-to-image generative models with R0 at scales. Our results challenge the conventional wisdom of diffusion post-training and conditional generation by demonstrating that rewards play a dominant role in scenarios with complex conditions. We hope our findings can contribute to further research into human-centric and reward-centric generation paradigms across the broader field of AIGC. Code is available at https://github.com/Luo-Yihong/R0.

獎勵機制足以實現快速照片級真實感的文本到圖像生成

Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation

摘要

Support