替身：一种轻量级即插即用的视频生成身份控制方法

摘要

生成与用户指定身份相匹配的高保真人类视频在生成式AI领域中至关重要且充满挑战。现有方法往往依赖于过多的训练参数，且与其他AIGC工具的兼容性不足。本文提出Stand-In，一个轻量级即插即用的框架，用于视频生成中的身份保持。具体而言，我们在预训练的视频生成模型中引入了一个条件图像分支。通过带有条件位置映射的受限自注意力机制实现身份控制，仅需2000对数据即可快速学习。尽管仅引入并训练了约1%的额外参数，我们的框架在视频质量和身份保持方面均取得了优异成果，超越了其他全参数训练方法。此外，我们的框架还能无缝集成于其他任务，如主体驱动视频生成、姿态参考视频生成、风格化以及面部替换等。

English

Generating high-fidelity human videos that match user-specified identities is important yet challenging in the field of generative AI. Existing methods often rely on an excessive number of training parameters and lack compatibility with other AIGC tools. In this paper, we propose Stand-In, a lightweight and plug-and-play framework for identity preservation in video generation. Specifically, we introduce a conditional image branch into the pre-trained video generation model. Identity control is achieved through restricted self-attentions with conditional position mapping, and can be learned quickly with only 2000 pairs. Despite incorporating and training just sim1\% additional parameters, our framework achieves excellent results in video quality and identity preservation, outperforming other full-parameter training methods. Moreover, our framework can be seamlessly integrated for other tasks, such as subject-driven video generation, pose-referenced video generation, stylization, and face swapping.

替身：一种轻量级即插即用的视频生成身份控制方法

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

摘要

Support