WildActor：無約束身份保持的影片生成技術

摘要

實現可商用的人類影片生成技術，要求數位演員在動態鏡頭、多視角與複雜動作中保持嚴格一致的全身身份特徵，這一設定對現有方法仍具挑戰性。既有方法常存在過度聚焦臉部而忽略身體一致性，或產生因姿勢鎖定導致主體僵硬的複製貼上瑕疵。我們提出Actor-18M——專為捕捉無約束視角與環境下身份一致性而設計的大規模人類影片資料集，包含160萬支影片與1800萬張對應人體圖像，涵蓋任意視角與標準三視圖表徵。基於此資料集，我們開發WildActor框架，實現任意視角條件下的人類影片生成。透過非對稱身份保持注意力機制，結合視角自適應蒙地卡羅取樣策略，該框架能根據邊際效用迭代重加權參考條件，實現平衡的流形覆蓋。在自建Actor-Bench評估體系中，WildActor於多樣化鏡頭構圖、大視角轉換及劇烈運動場景下，均能穩定保持身體身份特徵，在這些挑戰性設定中超越現有方法。

English

Production-ready human video generation requires digital actors to maintain strictly consistent full-body identities across dynamic shots, viewpoints and motions, a setting that remains challenging for existing methods. Prior methods often suffer from face-centric behavior that neglects body-level consistency, or produce copy-paste artifacts where subjects appear rigid due to pose locking. We present Actor-18M, a large-scale human video dataset designed to capture identity consistency under unconstrained viewpoints and environments. Actor-18M comprises 1.6M videos with 18M corresponding human images, covering both arbitrary views and canonical three-view representations. Leveraging Actor-18M, we propose WildActor, a framework for any-view conditioned human video generation. We introduce an Asymmetric Identity-Preserving Attention mechanism coupled with a Viewpoint-Adaptive Monte Carlo Sampling strategy that iteratively re-weights reference conditions by marginal utility for balanced manifold coverage. Evaluated on the proposed Actor-Bench, WildActor consistently preserves body identity under diverse shot compositions, large viewpoint transitions, and substantial motions, surpassing existing methods in these challenging settings.

WildActor：無約束身份保持的影片生成技術

WildActor: Unconstrained Identity-Preserving Video Generation

摘要

Support