ChatPaper.aiChatPaper

MoCapAnything:基於單目影片的任意骨架統一3D動作捕捉系統

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

December 11, 2025
作者: Kehong Gong, Zhengyu Wen, Weixia He, Mingxi Xu, Qi Wang, Ning Zhang, Zhengyu Li, Dongze Lian, Wei Zhao, Xiaoyu He, Mingyuan Zhang
cs.AI

摘要

儘管動作捕捉技術現已支撐起遠超數位人領域的內容創作,但現有流程大多仍侷限於特定物種或模板。我們將此侷限性定義為「類別無關動作捕捉」(CAMoCap):給定單目影片與任意綁定骨架的3D資產作為提示,目標是重建能直接驅動該資產的旋轉制動畫(如BVH格式)。我們提出MoCapAnything——一個參考引導的分解式框架,先預測3D關節軌跡,再透過約束感知逆向運動學還原資產專屬旋轉。該系統包含三個可學習模組與輕量級IK階段:(1)參考提示編碼器:從資產骨架、網格及渲染圖像提取逐關節查詢;(2)影片特徵提取器:計算稠密視覺描述符並重建粗粒度4D變形網格,以橋接影片與關節空間;(3)統一運動解碼器:融合多模態線索生成時序連貫的軌跡。我們同時構建了Truebones Zoo數據集,包含1038個動作片段,每個片段均提供標準化的骨架-網格-渲染三元組。在領域內基準測試與真實場景影片上的實驗表明,MoCapAnything不僅能輸出高品質骨骼動畫,更能在異構骨架間實現有意義的跨物種動作遷移,為任意資產實現可擴展的提示驅動式3D動作捕捉。項目頁面:https://animotionlab.github.io/MoCapAnything/
English
Motion capture now underpins content creation far beyond digital humans, yet most existing pipelines remain species- or template-specific. We formalize this gap as Category-Agnostic Motion Capture (CAMoCap): given a monocular video and an arbitrary rigged 3D asset as a prompt, the goal is to reconstruct a rotation-based animation such as BVH that directly drives the specific asset. We present MoCapAnything, a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware inverse kinematics. The system contains three learnable modules and a lightweight IK stage: (1) a Reference Prompt Encoder that extracts per-joint queries from the asset's skeleton, mesh, and rendered images; (2) a Video Feature Extractor that computes dense visual descriptors and reconstructs a coarse 4D deforming mesh to bridge the gap between video and joint space; and (3) a Unified Motion Decoder that fuses these cues to produce temporally coherent trajectories. We also curate Truebones Zoo with 1038 motion clips, each providing a standardized skeleton-mesh-render triad. Experiments on both in-domain benchmarks and in-the-wild videos show that MoCapAnything delivers high-quality skeletal animations and exhibits meaningful cross-species retargeting across heterogeneous rigs, enabling scalable, prompt-driven 3D motion capture for arbitrary assets. Project page: https://animotionlab.github.io/MoCapAnything/
PDF201December 13, 2025