分步採樣,分塊優化:面向文本到圖像生成的塊級GRPO方法
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
October 24, 2025
作者: Yifu Luo, Penghui Du, Bo Li, Sinan Du, Tiantian Zhang, Yongzhe Chang, Kai Wu, Kun Gai, Xueqian Wang
cs.AI
摘要
群組相對策略優化(GRPO)在基於流匹配的文字轉圖像(T2I)生成中展現出強大潛力,但其面臨兩大關鍵限制:優勢歸因不精準,以及忽略生成的時序動態特性。本研究主張,將優化範式從步驟層級轉向區塊層級,能有效緩解這些問題。基於此理念,我們提出Chunk-GRPO——首個基於GRPO的區塊層級T2I生成方法。其核心思路是將連續步驟分組為具連貫性的「區塊」,以捕捉流匹配的內在時序動態,並在區塊層級進行策略優化。此外,我們引入可選的加權抽樣策略以進一步提升效能。大量實驗表明,Chunk-GRPO在偏好對齊與影像品質方面均達成優異成果,彰顯了區塊層級優化對GRPO類方法的應用前景。
English
Group Relative Policy Optimization (GRPO) has shown strong potential for
flow-matching-based text-to-image (T2I) generation, but it faces two key
limitations: inaccurate advantage attribution, and the neglect of temporal
dynamics of generation. In this work, we argue that shifting the optimization
paradigm from the step level to the chunk level can effectively alleviate these
issues. Building on this idea, we propose Chunk-GRPO, the first chunk-level
GRPO-based approach for T2I generation. The insight is to group consecutive
steps into coherent 'chunk's that capture the intrinsic temporal dynamics of
flow matching, and to optimize policies at the chunk level. In addition, we
introduce an optional weighted sampling strategy to further enhance
performance. Extensive experiments show that ChunkGRPO achieves superior
results in both preference alignment and image quality, highlighting the
promise of chunk-level optimization for GRPO-based methods.