AnyRecon: ビデオ拡散モデルによる任意視点の3D再構成

要旨

スパースビュー3D再構成は、カジュアルな撮影からシーンをモデリングするために不可欠であるが、非生成的再構成では依然として課題が多い。既存の拡散ベースの手法は新規視点の合成によってこの問題を緩和するが、多くの場合1つまたは2つの撮影フレームのみを条件付けとしており、幾何学的一貫性が制限され、大規模または多様なシーンへの拡張性が限定されている。我々はAnyReconを提案する。これは任意の順不同のスパース入力を用いた再構成のための拡張性のあるフレームワークであり、明示的な幾何学的制御を保持しつつ、柔軟な条件付け基数をサポートする。長距離条件付けをサポートするため、本手法は撮影ビューキャッシュを前置することで永続的なグローバルシーンメモリを構築し、時間的圧縮を除去して大きな視点変化下でもフレームレベル対応を維持する。優れた生成モデルに加えて、大規模3Dシーンにおいては生成と再構成の相互作用が極めて重要であることも明らかにした。そこで、明示的3D幾何メモリと幾何学駆動型撮影ビュー検索を通じて生成と再構成を結合する、幾何学を考慮した条件付け戦略を導入する。効率性を確保するため、4ステップの拡散蒸留とコンテキストウィンドウ疎注意を組み合わせ、二次計算量を削減する。大規模実験により、不規則な入力、大きな視点間隔、長い軌跡にわたるロバストで拡張性のある再構成を実証する。

English

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.

AnyRecon: ビデオ拡散モデルによる任意視点の3D再構成

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

要旨

Support