추론 후 재추론: 교차 시점 재검토를 통한 공간 추론 향상

초록

자아 중심 비디오로부터의 공간 추론은 관찰 가능한 증거가 카메라 궤적에 의해 제한되기 때문에 본질적으로 어렵다. 기존 방법은 단일 턴 추론에 의존하여 모델이 검증 가능한 증거보다 의미론적 사전 정보를 통해 기하학적 모호성을 해결하도록 강제한다. 우리는 공간 추론이 재검토 가능해야 한다고 주장한다: 제한된 증거 하에 형성된 결론은 보완적 시점이 가능해질 때 수정에 열려 있어야 한다. 이 통찰에 기반하여, 우리는 학습이 필요 없는 추론 시간 프레임워크인 Reason, then Re-reason (ReRe)을 제안하며, 두 단계로 구성된다: Reason 단계에서는 MLLM이 원본 비디오로부터 공간 가설을 형성하고, Re-reason 단계에서는 합성된 새로운 시점 비디오를 관찰하여 가설을 검증하거나 수정한다. 효과적인 시점 간 재검토를 가능하게 하기 위해, 우리는 예측된 3D 기하학으로부터 전략적으로 보완적인 새로운 시점을 렌더링하는 Geometry-to-Video 파이프라인을 설계한다. 이 시점들은 장면을 포괄하는 상승된 경사 시점을 특징으로 하며, 구조적 수정 없이 MLLM의 원래 비디오 인터페이스를 유지한다. VSI-Bench 및 STI-Bench에 대한 광범위한 평가는 ReRe가 오픈소스 MLLM을 크게 향상시켜 독점 최첨단 성능에 필적하게 함을 보여준다. 프로젝트 페이지: https://zhenjiemao.github.io/ReRe/

English

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we propose Reason, then Re-reason (ReRe), a training-free, inference-time framework with two phases: in the Reason Phase, an MLLM forms a spatial hypothesis from the original video; in the Re-reason Phase, it verifies or revises the hypothesis by observing a synthesized novel-view video. To enable effective cross-view revisiting, we design a Geometry-to-Video pipeline that renders strategically complementary novel views from predicted 3D geometry. These views feature an elevated, oblique perspective with scene-spanning coverage, while preserving the MLLM's native video interface without architectural modifications. Extensive evaluations on VSI-Bench and STI-Bench demonstrate that ReRe substantially boosts open-source MLLMs to rival proprietary state-of-the-art performance. Project page: https://zhenjiemao.github.io/ReRe/