WinT3R:基于窗口式流重建与相机令牌池
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
September 5, 2025
作者: Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He
cs.AI
摘要
我们提出了WinT3R,一种前馈重建模型,能够在线预测精确的相机姿态并生成高质量的点云地图。以往的方法在重建质量与实时性能之间难以兼顾。为解决这一问题,我们首先引入了一种滑动窗口机制,确保窗口内各帧之间充分的信息交流,从而在不增加大量计算负担的情况下提升几何预测的质量。此外,我们采用了一种紧凑的相机表示方法,并维护了一个全局相机令牌池,这在不牺牲效率的前提下增强了相机姿态估计的可靠性。这些设计使得WinT3R在在线重建质量、相机姿态估计及重建速度方面均达到了业界领先水平,这一结论已通过多样数据集上的广泛实验得到验证。代码与模型已公开于https://github.com/LiZizun/WinT3R。
English
We present WinT3R, a feed-forward reconstruction model capable of online
prediction of precise camera poses and high-quality point maps. Previous
methods suffer from a trade-off between reconstruction quality and real-time
performance. To address this, we first introduce a sliding window mechanism
that ensures sufficient information exchange among frames within the window,
thereby improving the quality of geometric predictions without large
computation. In addition, we leverage a compact representation of cameras and
maintain a global camera token pool, which enhances the reliability of camera
pose estimation without sacrificing efficiency. These designs enable WinT3R to
achieve state-of-the-art performance in terms of online reconstruction quality,
camera pose estimation, and reconstruction speed, as validated by extensive
experiments on diverse datasets. Code and model are publicly available at
https://github.com/LiZizun/WinT3R.