WinT3R：基于窗口式流重建与相机令牌池

摘要

我们提出了WinT3R，一种前馈重建模型，能够在线预测精确的相机姿态并生成高质量的点云地图。以往的方法在重建质量与实时性能之间难以兼顾。为解决这一问题，我们首先引入了一种滑动窗口机制，确保窗口内各帧之间充分的信息交流，从而在不增加大量计算负担的情况下提升几何预测的质量。此外，我们采用了一种紧凑的相机表示方法，并维护了一个全局相机令牌池，这在不牺牲效率的前提下增强了相机姿态估计的可靠性。这些设计使得WinT3R在在线重建质量、相机姿态估计及重建速度方面均达到了业界领先水平，这一结论已通过多样数据集上的广泛实验得到验证。代码与模型已公开于https://github.com/LiZizun/WinT3R。

English

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

WinT3R：基于窗口式流重建与相机令牌池

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

摘要

Support