WildActor：无约束身份保持的视频生成

摘要

要实现工业化级的人类视频生成，数字演员必须在动态镜头、多视角切换和复杂动作中保持严格一致的全身份特征，这一目标对现有方法仍具挑战性。现有技术往往存在面部中心化倾向而忽略身体层面的连贯性，或产生因姿态锁定导致主体僵硬的复制粘贴痕迹。我们推出Actor-18M——一个专为捕捉无约束视角环境下身份一致性而设计的大规模人类视频数据集，包含160万段视频及对应的1800万张人体图像，同时涵盖任意视角与标准三视角表征。基于该数据集，我们提出支持任意视角条件的人类视频生成框架WildActor，创新性地引入非对称身份保持注意力机制，并结合视角自适应蒙特卡洛采样策略，通过边际效用迭代重加权参考条件以实现均衡的流形覆盖。在自建评估基准Actor-Bench上的实验表明，WildActor在多样化镜头构图、大视角转换及剧烈运动场景下均能稳定保持身体身份特征，在这些挑战性设定中超越现有方法。

English

Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level consistency, or produce copy-paste artifacts where subjects appear rigid due to pose locking. We present Actor-18M, a large-scale human video dataset designed to capture identity consistency under unconstrained viewpoints and environments. Actor-18M comprises 1.6M videos with 18M corresponding human images, covering both arbitrary views and canonical three-view representations. Leveraging Actor-18M, we propose WildActor, a framework for any-view conditioned human video generation. We introduce an Asymmetric Identity-Preserving Attention mechanism coupled with a Viewpoint-Adaptive Monte Carlo Sampling strategy that iteratively re-weights reference conditions by marginal utility for balanced manifold coverage. Evaluated on the proposed Actor-Bench, WildActor consistently preserves body identity under diverse shot compositions, large viewpoint transitions, and substantial motions, surpassing existing methods in these challenging settings.