ChatPaper.aiChatPaper

单样本生成:通过空间解耦运动注入与混合上下文整合实现组合式人机环境视频合成

ONE-SHOT: Compositional Human-Environment Video Synthesis via Spatial-Decoupled Motion Injection and Hybrid Context Integration

April 1, 2026
作者: Fengyuan Yang, Luying Huang, Jiazhi Guan, Quanwei Yang, Dongwei Pan, Jianglin Fu, Haocheng Feng, Wei He, Kaisiyuan Wang, Hang Zhou, Angela Yao
cs.AI

摘要

近期视频基础模型(VFMs)的突破性进展已彻底改变了以人物为中心的视频生成技术,然而对主体与场景进行细粒度独立编辑仍是关键挑战。当前通过刚性三维几何构图实现环境控制的尝试,往往面临精确控制与生成灵活性之间的尖锐矛盾。此外,繁重的三维预处理流程仍制约着实际应用的可扩展性。本文提出参数高效的组合式人景视频生成框架ONE-SHOT,其核心在于将生成过程解耦为分离信号。具体而言,我们通过跨注意力机制引入规范空间注入策略,实现人物动态与环境线索的分离表征;同时提出动态锚定旋转位置编码(Dynamic-Grounded-RoPE)这一新型位置嵌入方法,无需启发式三维对齐即可建立异构空间域的对映关系。针对长时序生成需求,我们设计混合上下文集成机制,确保分钟级生成过程中主体与场景的一致性。实验表明,本方法在视频合成质量上显著超越现有最优技术,同时提供更优的结构控制能力与创意多样性。项目详情请访问:https://martayang.github.io/ONE-SHOT/。
English
Recent advances in Video Foundation Models (VFMs) have revolutionized human-centric video synthesis, yet fine-grained and independent editing of subjects and scenes remains a critical challenge. Recent attempts to incorporate richer environment control through rigid 3D geometric compositions often encounter a stark trade-off between precise control and generative flexibility. Furthermore, the heavy 3D pre-processing still limits practical scalability. In this paper, we propose ONE-SHOT, a parameter-efficient framework for compositional human-environment video generation. Our key insight is to factorize the generative process into disentangled signals. Specifically, we introduce a canonical-space injection mechanism that decouples human dynamics from environmental cues via cross-attention. We also propose Dynamic-Grounded-RoPE, a novel positional embedding strategy that establishes spatial correspondences between disparate spatial domains without any heuristic 3D alignments. To support long-horizon synthesis, we introduce a Hybrid Context Integration mechanism to maintain subject and scene consistency across minute-level generations. Experiments demonstrate that our method significantly outperforms state-of-the-art methods, offering superior structural control and creative diversity for video synthesis. Our project has been available on: https://martayang.github.io/ONE-SHOT/.
PDF71April 8, 2026