4DEquine: 단안 비디오 기반 4D 말 복원을 위한 운동과 외관 분리

초록

단안 비디오로부터 말과 같은 말과 동물의 4D 재구성은 동물 복지에 중요합니다. 기존 주류 4D 동물 재구성 방법들은 전체 비디오에 걸쳐 운동과 외관을 함께 최적화해야 하며, 이는 시간이 많이 소요되고 불완전한 관찰에 민감한 문제점이 있습니다. 본 연구에서는 4D 재구성 문제를 동적 운동 재구성과 정적 외관 재구성이라는 두 하위 문제로 분리하는 4DEquine이라는 새로운 프레임워크를 제안합니다. 운동의 경우, 비디오로부터 부드럽고 픽셀 정렬된 자세 및 형태 시퀀스를 회귀하기 위해 단순하면서도 효과적인 시공간 트랜스포머와 사후 최적화 단계를 도입합니다. 외관의 경우, 단일 이미지만으로도 고품질의 애니메이션 가능한 3D 가우시안 아바타를 재구성하는 새로운 피드포워드 네트워크를 설계합니다. 학습을 지원하기 위해 고품질 표면 운동과 다양한 카메라 궤적을 특징으로 하는 대규모 합성 운동 데이터셋 VarenPoser와, 다중 뷰 확산을 통해 생성된 사실적인 다중 시점 이미지로 구성된 합성 외관 데이터셋 VarenTex를 구축했습니다. 합성 데이터셋으로만 학습했음에도 불구하고, 4DEquine은 실제 APT36K 및 AiM 데이터셋에서 최첨단 성능을 달성하여 기하학적 및 외관 재구성 모두에 있어 4DEquine과 우리의 새로운 데이터셋의 우수성을 입증합니다. 포괄적인 애블레이션 연구를 통해 운동 및 외관 재구성 네트워크의 효과성을 검증했습니다. 프로젝트 페이지: https://luoxue-star.github.io/4DEquine_Project_Page/.

English

4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is time-consuming and sensitive to incomplete observation. In this work, we propose a novel framework called 4DEquine by disentangling the 4D reconstruction problem into two sub-problems: dynamic motion reconstruction and static appearance reconstruction. For motion, we introduce a simple yet effective spatio-temporal transformer with a post-optimization stage to regress smooth and pixel-aligned pose and shape sequences from video. For appearance, we design a novel feed-forward network that reconstructs a high-fidelity, animatable 3D Gaussian avatar from as few as a single image. To assist training, we create a large-scale synthetic motion dataset, VarenPoser, which features high-quality surface motions and diverse camera trajectories, as well as a synthetic appearance dataset, VarenTex, comprising realistic multi-view images generated through multi-view diffusion. While training only on synthetic datasets, 4DEquine achieves state-of-the-art performance on real-world APT36K and AiM datasets, demonstrating the superiority of 4DEquine and our new datasets for both geometry and appearance reconstruction. Comprehensive ablation studies validate the effectiveness of both the motion and appearance reconstruction network. Project page: https://luoxue-star.github.io/4DEquine_Project_Page/.

4DEquine: 단안 비디오 기반 4D 말 복원을 위한 운동과 외관 분리

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

초록

Support