ChatPaper.aiChatPaper

Concat-ID:迈向通用身份保持的视频合成

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

March 18, 2025
作者: Yong Zhong, Zhuoyi Yang, Jiayan Teng, Xiaotao Gu, Chongxuan Li
cs.AI

摘要

我们提出了Concat-ID,一个用于身份保持视频生成的统一框架。Concat-ID采用变分自编码器提取图像特征,这些特征沿序列维度与视频潜在表示进行拼接,仅利用3D自注意力机制而无需额外模块。我们引入了一种新颖的跨视频配对策略和多阶段训练方案,以在增强视频自然度的同时平衡身份一致性和面部可编辑性。大量实验表明,Concat-ID在单身份和多身份生成方面均优于现有方法,并且能够无缝扩展到多主体场景,包括虚拟试穿和背景可控生成。Concat-ID为身份保持视频合成设立了新基准,为广泛的应用提供了一个多功能且可扩展的解决方案。
English
We present Concat-ID, a unified framework for identity-preserving video generation. Concat-ID employs Variational Autoencoders to extract image features, which are concatenated with video latents along the sequence dimension, leveraging solely 3D self-attention mechanisms without the need for additional modules. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance identity consistency and facial editability while enhancing video naturalness. Extensive experiments demonstrate Concat-ID's superiority over existing methods in both single and multi-identity generation, as well as its seamless scalability to multi-subject scenarios, including virtual try-on and background-controllable generation. Concat-ID establishes a new benchmark for identity-preserving video synthesis, providing a versatile and scalable solution for a wide range of applications.

Summary

AI-Generated Summary

PDF102March 19, 2025