ChatPaper.aiChatPaper

iMontage:統一、多功能、高度動態的多對多圖像生成

iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

November 25, 2025
作者: Zhoujie Fu, Xianfang Zeng, Jinghong Lan, Xinyao Liao, Cheng Chen, Junyi Chen, Jiacheng Wei, Wei Cheng, Shiyu Liu, Yunuo Chen, Gang Yu, Guosheng Lin
cs.AI

摘要

預訓練影片模型具備生成高品質、時間連貫內容的強大先驗知識。儘管這些模型在時間連貫性上表現卓越,但其動態表現往往受制於訓練資料的連續性特質。我們提出假設:若能將圖像資料中豐富且無約束的內容多樣性注入此連貫的時間框架,即可生成兼具自然過渡效果與更廣闊動態範圍的圖像集合。為此,我們推出 iMontage——一個將強大影片模型重新定位為全能圖像生成器的統一框架。該框架能處理並產出可變長度的圖像集合,統合多種圖像生成與編輯任務。我們提出優雅且低侵入性的適應策略,輔以量身打造的資料篩選流程與訓練模式,使模型在保有原有珍貴運動先驗的同時,獲得廣泛的圖像操控能力。iMontage 在多項主流多對多任務中表現卓越,不僅保持強韌的跨圖像上下文一致性,更能生成超越傳統範疇的非凡動態場景。專案主頁請訪問:https://kr1sjfu.github.io/iMontage-web/。
English
Pre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained content diversity from image data into this coherent temporal framework, we can generate image sets that feature both natural transitions and a far more expansive dynamic range. To this end, we introduce iMontage, a unified framework designed to repurpose a powerful video model into an all-in-one image generator. The framework consumes and produces variable-length image sets, unifying a wide array of image generation and editing tasks. To achieve this, we propose an elegant and minimally invasive adaptation strategy, complemented by a tailored data curation process and training paradigm. This approach allows the model to acquire broad image manipulation capabilities without corrupting its invaluable original motion priors. iMontage excels across several mainstream many-in-many-out tasks, not only maintaining strong cross-image contextual consistency but also generating scenes with extraordinary dynamics that surpass conventional scopes. Find our homepage at: https://kr1sjfu.github.io/iMontage-web/.
PDF302December 1, 2025