ChatPaper.aiChatPaper

上下文感知扩散:面向角色一致性的文本到视频生成技术

ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

December 8, 2025
作者: Ziyang Mai, Yu-Wing Tai
cs.AI

摘要

文本到视频(T2V)生成技术发展迅猛,但如何在多场景中保持角色身份一致性仍是关键挑战。现有个性化方法多聚焦于面部特征,却难以维持发型、着装、体态等对视觉连贯性至关重要的广义上下文特征。我们提出ContextAnyone——一种基于上下文感知的扩散框架,通过单张参考图像与文本描述即可实现角色一致的视频生成。该方法通过联合重建参考图像与生成新视频帧,使模型能充分感知并利用参考信息。我们设计了新颖的Emphasize-Attention模块,将参考信息有效整合到DiT扩散主干网络中,该模块能选择性强化参考感知特征并防止跨帧身份漂移。双引导损失函数结合了扩散目标与参考重建目标以增强外观保真度,同时提出的Gap-RoPE位置编码通过分离参考标记与视频标记来稳定时序建模。实验表明,ContextAnyone在身份一致性与视觉质量上均优于现有参考视频生成方法,能在多样化动作与场景中生成连贯且保持上下文特征的角色视频。项目页面:https://github.com/ziyang1106/ContextAnyone。
English
Text-to-video (T2V) generation has advanced rapidly, yet maintaining consistent character identities across scenes remains a major challenge. Existing personalization methods often focus on facial identity but fail to preserve broader contextual cues such as hairstyle, outfit, and body shape, which are critical for visual coherence. We propose ContextAnyone, a context-aware diffusion framework that achieves character-consistent video generation from text and a single reference image. Our method jointly reconstructs the reference image and generates new video frames, enabling the model to fully perceive and utilize reference information. Reference information is effectively integrated into a DiT-based diffusion backbone through a novel Emphasize-Attention module that selectively reinforces reference-aware features and prevents identity drift across frames. A dual-guidance loss combines diffusion and reference reconstruction objectives to enhance appearance fidelity, while the proposed Gap-RoPE positional embedding separates reference and video tokens to stabilize temporal modeling. Experiments demonstrate that ContextAnyone outperforms existing reference-to-video methods in identity consistency and visual quality, generating coherent and context-preserving character videos across diverse motions and scenes. Project page: https://github.com/ziyang1106/ContextAnyone{https://github.com/ziyang1106/ContextAnyone}.
PDF02December 18, 2025