视频生成中的角色混合

摘要

设想憨豆先生闯入《猫和老鼠》的世界——我们能否生成角色在不同宇宙间自然互动的视频？我们研究了文本到视频生成中的跨角色互动，其核心挑战在于保持每个角色的身份特征与行为逻辑的同时，实现跨情境的连贯互动。这一任务颇具难度，因为角色可能从未共存过，且风格混搭常导致风格失真，使写实角色显得卡通化，反之亦然。为此，我们提出了一种框架，通过跨角色嵌入（CCE）学习多模态源中的身份与行为逻辑，以及跨角色增强（CCA）利用合成的共存与混合风格数据丰富训练。这些技术共同作用，使得原本不共存的角色能够自然互动，同时不失风格保真度。在一项包含10个角色的卡通与实拍剧集精选基准测试中，实验结果显示在身份保持、互动质量及对风格失真的鲁棒性方面均有显著提升，为生成式叙事开辟了新途径。更多成果与视频请访问我们的项目页面：https://tingtingliao.github.io/mimix/。

English

Imagine Mr. Bean stepping into Tom and Jerry--can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character's identity and behaviors while enabling coherent cross-context interaction. This is difficult because characters may never have coexisted and because mixing styles often causes style delusion, where realistic characters appear cartoonish or vice versa. We introduce a framework that tackles these issues with Cross-Character Embedding (CCE), which learns identity and behavioral logic across multimodal sources, and Cross-Character Augmentation (CCA), which enriches training with synthetic co-existence and mixed-style data. Together, these techniques allow natural interactions between previously uncoexistent characters without losing stylistic fidelity. Experiments on a curated benchmark of cartoons and live-action series with 10 characters show clear improvements in identity preservation, interaction quality, and robustness to style delusion, enabling new forms of generative storytelling.Additional results and videos are available on our project page: https://tingtingliao.github.io/mimix/.

视频生成中的角色混合

Character Mixing for Video Generation

摘要

Support