SAM 3D Animal：从野外图像中进行可提示的动物三维重建

摘要

野外三维动物重建仍然面临挑战，原因在于物种间差异大、频繁遮挡以及多动物场景的普遍存在，而现有方法主要聚焦于单动物环境。我们提出SAM 3D Animal，这是首个从单张图像实现多动物三维重建的可提示框架。该方法基于SMAL+参数化动物模型，能够联合重建多个实例，并支持以关键点和掩码形式提供的灵活提示，从而在拥挤和遮挡场景中实现更可靠的去歧义。为训练此类模型，我们进一步引入Herd3D，这是一个包含超过5000张图像的多动物三维数据集，旨在增加物种多样性、交互模式和遮挡类型。在Animal3D、APTv2和Animal Kingdom数据集上的实验表明，我们的框架在现有的基于模型和无模型方法中均达到最先进水平，为野外环境中提示驱动的动物三维重建提供了可扩展且有效的解决方案。

English

3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D Animal, the first promptable framework for multi-animal 3D reconstruction from a single image. Built on the SMAL+ parametric animal model, our method jointly reconstructs multiple instances and supports flexible prompts in the form of keypoints and masks which enable more reliable disambiguation in crowded and occluded scenes. To train such a model, we further introduce Herd3D, a multi-animal 3D dataset containing over 5K images, designed to increase diversity in species, interactions, and occlusion patterns. Experiments on the Animal3D, APTv2, and Animal Kingdom datasets show that our framework achieves state-of-the-art results over both existing model-based and model-free methods, demonstrating a scalable and effective solution for prompt-driven animal 3D reconstruction in the wild.