4DGS360: 単一ビデオからの動的オブジェクトの360°ガウス再構成

要旨

本論文では、単眼カメラによる日常的な動画からの360°動的オブジェクト再構築を実現する、拡散モデルに依存しないフレームワーク「4DGS360」を提案する。既存手法は、2Dネイティブな事前知識への過度な依存により、初期点群が各学習視点の可視表面に過適合するため、一貫性のある360°形状の再構築にしばしば失敗する。4DGS360は、隠蔽領域の幾何学的曖昧性を緩和する高度な3Dネイティブな初期化手法によりこの課題に取り組む。提案する3Dトラッカー「AnchorTAP3D」は、信頼性の高い2Dトラック点をアンカーとして利用することで強化された3D点軌跡を生成し、ドリフトを抑制するとともに、隠蔽領域の形状を保持する信頼性の高い初期化を実現する。この初期化と最適化を組み合わせることで、一貫性のある360°4D再構築が可能となる。さらに、学習視点から最大135°離れた位置にテストカメラを配置した新しいベンチマーク「iPhone360」を提案する。これにより、既存データセットでは不可能であった360°評価が可能となる。実験により、4DGS360がiPhone360、iPhone、DAVISデータセットにおいて、定性的・定量的に最先端の性能を達成することを示す。

English

We introduce 4DGS360, a diffusion-free framework for 360^{circ} dynamic object reconstruction from casual monocular video. Existing methods often fail to reconstruct consistent 360^{circ} geometry, as their heavy reliance on 2D-native priors causes initial points to overfit to visible surface in each training view. 4DGS360 addresses this challenge through a advanced 3D-native initialization that mitigates the geometric ambiguity of occluded regions. Our proposed 3D tracker, AnchorTAP3D, produces reinforced 3D point trajectories by leveraging confident 2D track points as anchors, suppressing drift and providing reliable initialization that preserves geometry in occluded regions. This initialization, combined with optimization, yields coherent 360^{circ} 4D reconstructions. We further present iPhone360, a new benchmark where test cameras are placed up to 135^{circ} apart from training views, enabling 360^{circ} evaluation that existing datasets cannot provide. Experiments show that 4DGS360 achieves state-of-the-art performance on the iPhone360, iPhone, and DAVIS datasets, both qualitatively and quantitatively.

4DGS360: 単一ビデオからの動的オブジェクトの360°ガウス再構成

4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video

要旨

Support