CM-EVS: 完全なシーンカバレッジのためのスパースパノラマRGB-D-ポーズデータ

要旨

現代の3次元ビジュアルラーニングは、メートル単位の3Dアセットからサンプリングされた観測データに依存しているが、既存のスキャン、メッシュ、点群、シミュレーション、再構築データは、スパースで比較可能かつ幾何学的に一貫したパノラマ訓練インターフェースを直接提供しない。密な軌跡は近傍視点を重複させ、ソース固有のレンダリングポリシーは異質なアノテーションを生み出し、スパースなヒューリスティクスは重要な領域を見落としたり、深度の不整合を伴う観測を導入したりする可能性がある。本研究では、3DアセットをスパースなパノラマRGB-D-ポーズデータに変換する手法を検討し、これにより冗長性が低く、完全なシーンカバレッジとトレーサブルな生成過程を保持する。本稿では、COVER（Coverage-Oriented Viewpoint curation with ERP Range-depth warping）を提案する。これは訓練不要なERP視点キュレーターであり、選択された視点から観測された幾何形状を候補ERPプローブに投影し、増分カバレッジをスコアリングし、深度競合をペナルティ化する。有界なプロキシ誤差の下で、その貪欲なカバレッジプロキシは、加法的誤差項まで標準的なカバレッジ型近似挙動を保持する。COVERを用いて、CM-EVS（Coverage-curated Metric ERP View Set）を構築する。これは、Blender indoor、HM3D、ScanNet++にわたる1,275の屋内シーンから36,373のキュレートされたERPフレームからなるパノラマRGB-D-ポーズデータセットであり、TartanGroundおよびOB3Dから再エンコードされた屋外パノラマで補完される。各フレームは全周RGB、メートル単位のレンジ深度、較正済みポーズを提供し、COVERが生成した屋内フレームにはステップごとの生成過程ログが含まれる。屋内シーンあたり中央値でわずか25フレームでありながら、CM-EVSは13の統一された部屋タイプすべてをカバーし、コンパクトなシーンレベルのカバレッジを維持する。実験により、COVERはカバレッジと競合のトレードオフを改善し、CM-EVSは幾何学的に一貫したパノラマ3D学習のためのスパースでコンパクト、かつトレーサブルなRGB-D-ポーズリソースとなることが示された。

English

Modern 3D visual learning relies on observations sampled from metric 3D assets, yet existing scans, meshes, point clouds, simulations, and reconstructions do not directly provide a sparse, comparable, and geometry-consistent panoramic training interface. Dense trajectories duplicate nearby views, source-specific rendering policies yield heterogeneous annotations, and sparse heuristics may miss important regions or introduce depth-inconsistent observations. We study how to convert 3D assets into sparse panoramic RGB-D-pose data that preserves complete scene coverage with low redundancy and auditable provenance. We propose COVER (Coverage-Oriented Viewpoint curation with ERP Range-depth warping), a training-free ERP viewpoint curator that projects geometry observed from selected views into candidate ERP probes, scores incremental coverage, and penalizes depth conflicts. Under bounded proxy error, its greedy coverage proxy preserves the standard coverage-style approximation behavior up to an additive error term. Using COVER, we build CM-EVS (Coverage-curated Metric ERP View Set), a panoramic RGB-D-pose dataset with 36,373 curated ERP frames from 1,275 indoor scenes across Blender indoor, HM3D, and ScanNet++, complemented by outdoor panoramas from TartanGround and OB3D re-encoded into the same schema. Each frame provides full-sphere RGB, metric range depth, calibrated pose; COVER-produced indoor frames include per-step provenance logs. With a median of only 25 frames per indoor scene, CM-EVS covers all 13 unified room types while maintaining compact scene-level coverage. Experiments show that COVER improves the coverage-conflict trade-off, making CM-EVS a sparse, compact, and auditable RGB-D-pose resource for geometry-consistent panoramic 3D learning.