ChatPaper.aiChatPaper

透過測試時進化搜索實現圖像與視頻生成的規模化

Scaling Image and Video Generation via Test-Time Evolutionary Search

May 23, 2025
作者: Haoran He, Jiajun Liang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Ling Pan
cs.AI

摘要

隨著模型預訓練階段擴展計算(數據和參數)的邊際成本持續大幅增加,測試時擴展(TTS)已成為一種有前景的方向,通過在推理時分配額外計算來提升生成模型的性能。儘管TTS在多項語言任務中展現了顯著成功,但對於圖像和視頻生成模型(基於擴散或流模型)的測試時擴展行為仍存在顯著的認知空白。雖然近期研究已開始探索視覺任務的推理時策略,這些方法面臨關鍵限制:受限於特定任務領域、可擴展性差,或陷入獎勵過度優化而犧牲樣本多樣性。本文提出了一種新穎、通用且高效的TTS方法——進化搜索(EvoSearch),它有效增強了基於擴散和流模型的圖像與視頻生成的擴展性,無需額外訓練或模型擴展。EvoSearch將擴散和流模型的測試時擴展重新定義為一個進化搜索問題,利用生物進化原理高效探索並優化去噪軌跡。通過針對隨機微分方程去噪過程精心設計的選擇與變異機制,EvoSearch在保持種群多樣性的同時,迭代生成更高質量的後代。在圖像和視頻生成任務中,對多種擴散和流架構進行廣泛評估後,我們證明該方法始終優於現有方法,實現了更高的多樣性,並在未見過的評估指標上展現出強大的泛化能力。項目詳情請訪問網站https://tinnerhrhe.github.io/evosearch。
English
As the marginal cost of scaling computation (data and parameters) during model pre-training continues to increase substantially, test-time scaling (TTS) has emerged as a promising direction for improving generative model performance by allocating additional computation at inference time. While TTS has demonstrated significant success across multiple language tasks, there remains a notable gap in understanding the test-time scaling behaviors of image and video generative models (diffusion-based or flow-based models). Although recent works have initiated exploration into inference-time strategies for vision tasks, these approaches face critical limitations: being constrained to task-specific domains, exhibiting poor scalability, or falling into reward over-optimization that sacrifices sample diversity. In this paper, we propose Evolutionary Search (EvoSearch), a novel, generalist, and efficient TTS method that effectively enhances the scalability of both image and video generation across diffusion and flow models, without requiring additional training or model expansion. EvoSearch reformulates test-time scaling for diffusion and flow models as an evolutionary search problem, leveraging principles from biological evolution to efficiently explore and refine the denoising trajectory. By incorporating carefully designed selection and mutation mechanisms tailored to the stochastic differential equation denoising process, EvoSearch iteratively generates higher-quality offspring while preserving population diversity. Through extensive evaluation across both diffusion and flow architectures for image and video generation tasks, we demonstrate that our method consistently outperforms existing approaches, achieves higher diversity, and shows strong generalizability to unseen evaluation metrics. Our project is available at the website https://tinnerhrhe.github.io/evosearch.

Summary

AI-Generated Summary

PDF382May 26, 2025