비디오 생성을 위한 캐릭터 혼합

초록

미스터 빈이 톰과 제리 속으로 들어가는 상상을 해보자—서로 다른 세계의 캐릭터들이 자연스럽게 상호작용하는 영상을 생성할 수 있을까? 우리는 텍스트-투-비디오 생성에서 캐릭터 간 상호작용을 연구하며, 여기서 핵심 과제는 각 캐릭터의 정체성과 행동을 보존하면서도 다양한 맥락 간의 일관된 상호작용을 가능하게 하는 것이다. 이는 캐릭터들이 서로 공존한 적이 없을 수 있고, 스타일을 혼합할 경우 현실적인 캐릭터가 만화처럼 보이거나 그 반대의 현상인 스타일 혼동(style delusion)이 발생할 수 있기 때문에 어려운 문제이다. 우리는 이러한 문제를 해결하기 위해 크로스-캐릭터 임베딩(Cross-Character Embedding, CCE)과 크로스-캐릭터 증강(Cross-Character Augmentation, CCA)을 도입한 프레임워크를 제안한다. CCE는 다중 모달 소스에서 정체성과 행동 논리를 학습하며, CCA는 합성된 공존 데이터와 혼합 스타일 데이터를 통해 학습을 풍부하게 한다. 이러한 기법들은 이전에 공존하지 않았던 캐릭터들 간의 자연스러운 상호작용을 가능하게 하면서도 스타일 충실도를 유지한다. 만화와 실사 시리즈의 10개 캐릭터로 구성된 벤치마크에서의 실험은 정체성 보존, 상호작용 품질, 스타일 혼동에 대한 강건성에서 명확한 개선을 보여주며, 새로운 형태의 생성적 스토리텔링을 가능하게 한다. 추가 결과와 영상은 프로젝트 페이지(https://tingtingliao.github.io/mimix/)에서 확인할 수 있다.

English

Imagine Mr. Bean stepping into Tom and Jerry--can we generate videos where characters interact naturally across different worlds? We study inter-character interaction in text-to-video generation, where the key challenge is to preserve each character's identity and behaviors while enabling coherent cross-context interaction. This is difficult because characters may never have coexisted and because mixing styles often causes style delusion, where realistic characters appear cartoonish or vice versa. We introduce a framework that tackles these issues with Cross-Character Embedding (CCE), which learns identity and behavioral logic across multimodal sources, and Cross-Character Augmentation (CCA), which enriches training with synthetic co-existence and mixed-style data. Together, these techniques allow natural interactions between previously uncoexistent characters without losing stylistic fidelity. Experiments on a curated benchmark of cartoons and live-action series with 10 characters show clear improvements in identity preservation, interaction quality, and robustness to style delusion, enabling new forms of generative storytelling.Additional results and videos are available on our project page: https://tingtingliao.github.io/mimix/.

비디오 생성을 위한 캐릭터 혼합

Character Mixing for Video Generation

초록

Support