ReconPhys: 単一映像からの外観と物理属性の再構築

要旨

物理的に妥当な非剛体オブジェクトの再構成は、依然として重要な課題である。既存手法では、微分可能レンダリングを活用したシーン毎の最適化により形状と動力学を復元するが、高コストな調整や手動アノテーションを必要とし、実用性と一般性が制限される。この問題に対処するため、我々は単眼ビデオから物理属性推定と3Dガウススプラッティング再構成を同時に学習する初のフォワードフレームワークReconPhysを提案する。本手法は、教師なし戦略で学習するデュアルブランチ構造を採用し、物理パラメータの正解データを不要とする。ビデオ系列を入力すると、ReconPhysは形状、外観、物理属性を同時に推論する。大規模合成データセットによる実験では、将来予測において従来の最適化ベースライン手法の13.27 PSNRに対し21.64を達成し、チャンファー距離を0.349から0.004に改善する優れた性能を実証した。決定的に、既存手法が数時間を要するのに対し、ReconPhysは1秒未満の高速推論を可能にし、ロボティクスやグラフィックス向けシミュレーション対応アセットの迅速な生成を促進する。

English

Reconstructing non-rigid objects with physical plausibility remains a significant challenge. Existing approaches leverage differentiable rendering for per-scene optimization, recovering geometry and dynamics but requiring expensive tuning or manual annotation, which limits practicality and generalizability. To address this, we propose ReconPhys, the first feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. Our method employs a dual-branch architecture trained via a self-supervised strategy, eliminating the need for ground-truth physics labels. Given a video sequence, ReconPhys simultaneously infers geometry, appearance, and physical attributes. Experiments on a large-scale synthetic dataset demonstrate superior performance: our method achieves 21.64 PSNR in future prediction compared to 13.27 by state-of-the-art optimization baselines, while reducing Chamfer Distance from 0.349 to 0.004. Crucially, ReconPhys enables fast inference (<1 second) versus hours required by existing methods, facilitating rapid generation of simulation-ready assets for robotics and graphics.

ReconPhys: 単一映像からの外観と物理属性の再構築

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

要旨

Support