iMontage:统一、多功能、高度动态的多对多图像生成
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
November 25, 2025
作者: Zhoujie Fu, Xianfang Zeng, Jinghong Lan, Xinyao Liao, Cheng Chen, Junyi Chen, Jiacheng Wei, Wei Cheng, Shiyu Liu, Yunuo Chen, Gang Yu, Guosheng Lin
cs.AI
摘要
预训练视频模型通过学习强大的先验知识,能够生成高质量、时序连贯的内容。尽管这些模型在时序连贯性方面表现出色,但其动态范围往往受限于训练数据的连续性特征。我们提出假设:通过将图像数据中丰富且无约束的内容多样性注入这一连贯的时序框架,可以生成既具有自然过渡效果、又具备更广阔动态范围的图像集合。为此,我们推出iMontage——一个将强大视频模型重构为全能图像生成器的统一框架。该框架支持可变长度的图像集输入与输出,统一了多种图像生成与编辑任务。为实现这一目标,我们提出了一种精巧且低侵入度的适配策略,并辅以定制化的数据筛选流程和训练范式。该方法使模型在保持原有宝贵运动先验的同时,获得了广泛的图像操控能力。iMontage在多项主流多输入多输出任务中表现卓越,不仅能保持强大的跨图像上下文一致性,还能生成超越传统范围的超常规动态场景。项目主页请访问:https://kr1sjfu.github.io/iMontage-web/。
English
Pre-trained video models learn powerful priors for generating high-quality, temporally coherent content. While these models excel at temporal coherence, their dynamics are often constrained by the continuous nature of their training data. We hypothesize that by injecting the rich and unconstrained content diversity from image data into this coherent temporal framework, we can generate image sets that feature both natural transitions and a far more expansive dynamic range. To this end, we introduce iMontage, a unified framework designed to repurpose a powerful video model into an all-in-one image generator. The framework consumes and produces variable-length image sets, unifying a wide array of image generation and editing tasks. To achieve this, we propose an elegant and minimally invasive adaptation strategy, complemented by a tailored data curation process and training paradigm. This approach allows the model to acquire broad image manipulation capabilities without corrupting its invaluable original motion priors. iMontage excels across several mainstream many-in-many-out tasks, not only maintaining strong cross-image contextual consistency but also generating scenes with extraordinary dynamics that surpass conventional scopes. Find our homepage at: https://kr1sjfu.github.io/iMontage-web/.