ChatPaper.aiChatPaper

WinT3R:基於視窗的流式重建與相機令牌池

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

September 5, 2025
作者: Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He
cs.AI

摘要

我們提出了WinT3R,這是一種前饋重建模型,能夠在線預測精確的相機姿態並生成高質量的點雲地圖。以往的方法在重建質量與實時性能之間存在權衡。為解決這一問題,我們首先引入了一種滑動窗口機制,確保窗口內幀間有充分的信息交流,從而無需大量計算即可提升幾何預測的質量。此外,我們採用了一種緊湊的相機表示法,並維護了一個全局相機標記池,這在不犧牲效率的前提下增強了相機姿態估計的可靠性。這些設計使WinT3R在線重建質量、相機姿態估計及重建速度方面達到了最先進的水平,這一點通過在多樣化數據集上的廣泛實驗得到了驗證。代碼和模型已公開於https://github.com/LiZizun/WinT3R。
English
We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.
PDF52September 8, 2025