4DEquine：从单目视频中解耦运动与外观实现四维马匹重建

摘要

基于单目视频的马科动物（如马匹）四维重建对动物福祉研究具有重要意义。当前主流的四维动物重建方法需对整个视频序列进行运动与外观的联合优化，该方法耗时较长且对不完整观测敏感。本研究提出名为4DEquine的创新框架，通过将四维重建解耦为动态运动重建和静态外观重建两个子问题。在运动重建方面，我们引入具有后优化阶段的时空Transformer模型，从视频中回归出平滑且像素对齐的姿态与体型序列；在外观重建方面，设计了一种前馈式网络，仅需单张图像即可重建高保真、可驱动的三维高斯化身。为辅助训练，我们构建了大规模合成运动数据集VarenPoser（包含高质量表面运动与多视角相机轨迹）以及合成外观数据集VarenTex（通过多视角扩散模型生成逼真多视图图像）。尽管仅使用合成数据训练，4DEquine在真实世界APT36K和AiM数据集上仍达到最先进性能，验证了该方法在几何与外观重建方面的优越性。系统的消融实验证明了运动与外观重建网络的有效性。项目页面：https://luoxue-star.github.io/4DEquine_Project_Page/。

English

4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.

4DEquine：从单目视频中解耦运动与外观实现四维马匹重建

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

摘要

Support