事前学習モデル時代における非ポーズ・疎ビューからの室内レイアウト再構築

要旨

複数視点画像からの部屋レイアウト推定は、マルチビュー幾何学に起因する複雑さから十分に研究されていない。これには、カメラの内部パラメータと外部パラメータの推定、画像マッチング、三角測量といった多段階の解決策が必要とされる。しかし、3D再構築の分野では、DUSt3Rのような最近の3D基盤モデルの進化により、従来の多段階的なStructure-from-Motionプロセスから、エンドツーエンドの単一段階アプローチへとパラダイムがシフトしている。これを受けて、我々は3D基盤モデルDUSt3Rを活用した多視点部屋レイアウト推定の新手法、Plane-DUSt3Rを提案する。Plane-DUSt3Rは、DUSt3Rフレームワークを組み込み、部屋レイアウトデータセット（Structure3D）で微調整を行い、構造平面を推定するために目的関数を修正している。均一で簡潔な結果を生成することにより、Plane-DUSt3Rは単一の後処理ステップと2D検出結果のみで部屋レイアウト推定を可能にする。単一視点やパノラマ画像に依存する従来の手法とは異なり、Plane-DUSt3Rは複数視点画像を扱う設定に拡張している。さらに、プロセスを簡素化し、エラーの蓄積を減らす、ストリームラインドなエンドツーエンドソリューションを提供する。実験結果は、Plane-DUSt3Rが合成データセットにおいて最先端の手法を上回るだけでなく、漫画のような異なる画像スタイルの実世界データにおいても堅牢かつ効果的であることを示している。我々のコードは以下で公開されている：https://github.com/justacar/Plane-DUSt3R

English

Room layout estimation from multiple-perspective images is poorly investigated due to the complexities that emerge from multi-view geometry, which requires muti-step solutions such as camera intrinsic and extrinsic estimation, image matching, and triangulation. However, in 3D reconstruction, the advancement of recent 3D foundation models such as DUSt3R has shifted the paradigm from the traditional multi-step structure-from-motion process to an end-to-end single-step approach. To this end, we introduce Plane-DUSt3R, a novel method for multi-view room layout estimation leveraging the 3D foundation model DUSt3R. Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes. By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results. Unlike previous methods that rely on single-perspective or panorama image, Plane-DUSt3R extends the setting to handle multiple-perspective images. Moreover, it offers a streamlined, end-to-end solution that simplifies the process and reduces error accumulation. Experimental results demonstrate that Plane-DUSt3R not only outperforms state-of-the-art methods on the synthetic dataset but also proves robust and effective on in the wild data with different image styles such as cartoon.Our code is available at: https://github.com/justacar/Plane-DUSt3R

事前学習モデル時代における非ポーズ・疎ビューからの室内レイアウト再構築

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

要旨

Support