替身:一种轻量级即插即用的视频生成身份控制方法
Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation
August 11, 2025
作者: Bowen Xue, Qixin Yan, Wenjing Wang, Hao Liu, Chen Li
cs.AI
摘要
生成与用户指定身份相匹配的高保真人类视频在生成式AI领域中至关重要且充满挑战。现有方法往往依赖于过多的训练参数,且与其他AIGC工具的兼容性不足。本文提出Stand-In,一个轻量级即插即用的框架,用于视频生成中的身份保持。具体而言,我们在预训练的视频生成模型中引入了一个条件图像分支。通过带有条件位置映射的受限自注意力机制实现身份控制,仅需2000对数据即可快速学习。尽管仅引入并训练了约1%的额外参数,我们的框架在视频质量和身份保持方面均取得了优异成果,超越了其他全参数训练方法。此外,我们的框架还能无缝集成于其他任务,如主体驱动视频生成、姿态参考视频生成、风格化以及面部替换等。
English
Generating high-fidelity human videos that match user-specified identities is
important yet challenging in the field of generative AI. Existing methods often
rely on an excessive number of training parameters and lack compatibility with
other AIGC tools. In this paper, we propose Stand-In, a lightweight and
plug-and-play framework for identity preservation in video generation.
Specifically, we introduce a conditional image branch into the pre-trained
video generation model. Identity control is achieved through restricted
self-attentions with conditional position mapping, and can be learned quickly
with only 2000 pairs. Despite incorporating and training just sim1\%
additional parameters, our framework achieves excellent results in video
quality and identity preservation, outperforming other full-parameter training
methods. Moreover, our framework can be seamlessly integrated for other tasks,
such as subject-driven video generation, pose-referenced video generation,
stylization, and face swapping.