WinT3R：基於視窗的流式重建與相機令牌池

摘要

我們提出了WinT3R，這是一種前饋重建模型，能夠在線預測精確的相機姿態並生成高質量的點雲地圖。以往的方法在重建質量與實時性能之間存在權衡。為解決這一問題，我們首先引入了一種滑動窗口機制，確保窗口內幀間有充分的信息交流，從而無需大量計算即可提升幾何預測的質量。此外，我們採用了一種緊湊的相機表示法，並維護了一個全局相機標記池，這在不犧牲效率的前提下增強了相機姿態估計的可靠性。這些設計使WinT3R在線重建質量、相機姿態估計及重建速度方面達到了最先進的水平，這一點通過在多樣化數據集上的廣泛實驗得到了驗證。代碼和模型已公開於https://github.com/LiZizun/WinT3R。

English

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

WinT3R：基於視窗的流式重建與相機令牌池

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

摘要

Support