WinT3R:基於視窗的流式重建與相機令牌池
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
September 5, 2025
作者: Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He
cs.AI
摘要
我們提出了WinT3R,這是一種前饋重建模型,能夠在線預測精確的相機姿態並生成高質量的點雲地圖。以往的方法在重建質量與實時性能之間存在權衡。為解決這一問題,我們首先引入了一種滑動窗口機制,確保窗口內幀間有充分的信息交流,從而無需大量計算即可提升幾何預測的質量。此外,我們採用了一種緊湊的相機表示法,並維護了一個全局相機標記池,這在不犧牲效率的前提下增強了相機姿態估計的可靠性。這些設計使WinT3R在線重建質量、相機姿態估計及重建速度方面達到了最先進的水平,這一點通過在多樣化數據集上的廣泛實驗得到了驗證。代碼和模型已公開於https://github.com/LiZizun/WinT3R。
English
We present WinT3R, a feed-forward reconstruction model capable of online
prediction of precise camera poses and high-quality point maps. Previous
methods suffer from a trade-off between reconstruction quality and real-time
performance. To address this, we first introduce a sliding window mechanism
that ensures sufficient information exchange among frames within the window,
thereby improving the quality of geometric predictions without large
computation. In addition, we leverage a compact representation of cameras and
maintain a global camera token pool, which enhances the reliability of camera
pose estimation without sacrificing efficiency. These designs enable WinT3R to
achieve state-of-the-art performance in terms of online reconstruction quality,
camera pose estimation, and reconstruction speed, as validated by extensive
experiments on diverse datasets. Code and model are publicly available at
https://github.com/LiZizun/WinT3R.