视频生成中的角色混合技术

摘要

设想憨豆先生踏入汤姆和杰瑞的世界——我们能否生成视频，让来自不同世界的角色自然互动？我们研究了文本到视频生成中的跨角色互动，其核心挑战在于保持每个角色的身份与行为特征的同时，实现跨情境的连贯互动。这一难题源于角色间可能从未共存，以及风格混搭常导致风格混淆，即写实角色显得卡通化，反之亦然。我们提出了一种框架，通过跨角色嵌入（Cross-Character Embedding, CCE）学习多模态源中的身份与行为逻辑，并利用跨角色增强（Cross-Character Augmentation, CCA）通过合成共存与混合风格数据丰富训练。这些技术共同作用，使得先前未曾共处的角色间能够自然互动，同时不失风格的真实性。在一项包含10个角色的卡通与实拍剧集精选基准测试中，实验结果显示在身份保持、互动质量及对风格混淆的鲁棒性方面均有显著提升，为生成式叙事开辟了新途径。更多成果与视频请访问我们的项目页面：https://tingtingliao.github.io/mimix/。

English

Imagine Mr. Bean stepping into Tom and Jerry--can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character's identity and behaviors while enabling coherent cross-context interaction. This is difficult because characters may never have coexisted and because mixing styles often causes style delusion, where realistic characters appear cartoonish or vice versa. We introduce a framework that tackles these issues with Cross-Character Embedding (CCE), which learns identity and behavioral logic across multimodal sources, and Cross-Character Augmentation (CCA), which enriches training with synthetic co-existence and mixed-style data. Together, these techniques allow natural interactions between previously uncoexistent characters without losing stylistic fidelity. Experiments on a curated benchmark of cartoons and live-action series with 10 characters show clear improvements in identity preservation, interaction quality, and robustness to style delusion, enabling new forms of generative storytelling.Additional results and videos are available on our project page: https://tingtingliao.github.io/mimix/.

视频生成中的角色混合技术

Character Mixing for Video Generation

摘要

Support