CM-EVS: 완전한 장면 커버를 위한 희소 파노라마 RGB-D 포즈 데이터

초록

현대 3D 시각 학습은 미터법 3D 자산에서 샘플링된 관측에 의존하지만, 기존의 스캔, 메시, 포인트 클라우드, 시뮬레이션 및 재구성은 희소하고 비교 가능하며 기하학적으로 일관된 파노라마 학습 인터페이스를 직접 제공하지 않는다. 밀집된 궤적은 유사한 시점을 중복하고, 소스별 렌더링 정책은 이질적인 주석을 생성하며, 휴리스틱 기반 희소 방법은 중요한 영역을 놓치거나 깊이 불일치 관측을 초래할 수 있다. 본 연구는 3D 자산을 희소한 파노라마 RGB-D-포즈 데이터로 변환하는 방법을 연구하며, 이 데이터는 낮은 중복성과 감사 가능한 출처를 유지하면서 전체 장면을 포괄한다. 우리는 COVER(커버리지 지향 시점 선별과 ERP 범위-깊이 워핑)를 제안한다. 이는 훈련이 필요 없는 ERP 시점 선별기로서, 선택된 시점에서 관찰된 기하를 후보 ERP 프로브로 투영하고, 증분 커버리지를 점수화하며, 깊이 충돌에 패널티를 부여한다. 제한된 근사 오차 하에서, 그리디 커버리지 근사는 표준 커버리지 스타일의 근사 동작을 가산 오차 항까지 보존한다. COVER를 사용하여 우리는 CM-EVS(커버리지 선별 미터법 ERP 시점 집합)를 구축한다. 이는 Blender indoor, HM3D, ScanNet++의 1,275개 실내 장면에서 추출한 36,373개의 선별된 ERP 프레임과, TartanGround 및 OB3D에서 동일 스키마로 재인코딩된 실외 파노라마로 구성된 파노라마 RGB-D-포즈 데이터셋이다. 각 프레임은 전구체 RGB, 미터법 거리 깊이, 보정된 포즈를 제공하며, COVER가 생성한 실내 프레임은 단계별 출처 로그를 포함한다. 실내 장면당 중앙값 25프레임만으로도 CM-EVS는 13개의 통합된 방 유형을 모두 포괄하면서 장면 수준의 컴팩트한 커버리지를 유지한다. 실험 결과 COVER는 커버리지-충돌 트레이드오프를 개선하여, CM-EVS를 기하학적으로 일관된 파노라마 3D 학습을 위한 희소하고 컴팩트하며 감사 가능한 RGB-D-포즈 자원으로 만든다.

English

Modern 3D visual learning relies on observations sampled from metric 3D assets, yet existing scans, meshes, point clouds, simulations, and reconstructions do not directly provide a sparse, comparable, and geometry-consistent panoramic training interface. Dense trajectories duplicate nearby views, source-specific rendering policies yield heterogeneous annotations, and sparse heuristics may miss important regions or introduce depth-inconsistent observations. We study how to convert 3D assets into sparse panoramic RGB-D-pose data that preserves complete scene coverage with low redundancy and auditable provenance. We propose COVER (Coverage-Oriented Viewpoint curation with ERP Range-depth warping), a training-free ERP viewpoint curator that projects geometry observed from selected views into candidate ERP probes, scores incremental coverage, and penalizes depth conflicts. Under bounded proxy error, its greedy coverage proxy preserves the standard coverage-style approximation behavior up to an additive error term. Using COVER, we build CM-EVS (Coverage-curated Metric ERP View Set), a panoramic RGB-D-pose dataset with 36,373 curated ERP frames from 1,275 indoor scenes across Blender indoor, HM3D, and ScanNet++, complemented by outdoor panoramas from TartanGround and OB3D re-encoded into the same schema. Each frame provides full-sphere RGB, metric range depth, calibrated pose; COVER-produced indoor frames include per-step provenance logs. With a median of only 25 frames per indoor scene, CM-EVS covers all 13 unified room types while maintaining compact scene-level coverage. Experiments show that COVER improves the coverage-conflict trade-off, making CM-EVS a sparse, compact, and auditable RGB-D-pose resource for geometry-consistent panoramic 3D learning.