ChatPaper.aiChatPaper

具有持久狀態的連續3D感知模型

Continuous 3D Perception Model with Persistent State

January 21, 2025
作者: Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, Angjoo Kanazawa
cs.AI

摘要

我們提出了一個統一的框架,能夠解決廣泛的3D任務。我們的方法採用一個具有狀態的循環模型,不斷更新其狀態表示以應對每個新觀測。給定一系列圖像,這個不斷演進的狀態可以用來以在線方式為每個新輸入生成度量尺度的點地圖(每像素3D點)。這些點地圖位於一個共同的座標系統內,可以積累成一個連貫、密集的場景重建,隨著新圖像的到來而更新。我們的模型名為CUT3R(Continuous Updating Transformer for 3D Reconstruction),捕捉了現實世界場景的豐富先驗知識:它不僅可以從圖像觀測中預測準確的點地圖,還可以通過探測虛擬的未觀察視圖推斷場景中未見區域。我們的方法既簡單又高度靈活,自然地接受可能是視頻流或無序照片集的不同長度圖像,包含靜態和動態內容。我們在各種3D/4D任務上評估了我們的方法,並在每個任務中展示了具有競爭力或最先進的性能。項目頁面:https://cut3r.github.io/
English
We present a unified framework capable of solving a broad range of 3D tasks. Our approach features a stateful recurrent model that continuously updates its state representation with each new observation. Given a stream of images, this evolving state can be used to generate metric-scale pointmaps (per-pixel 3D points) for each new input in an online fashion. These pointmaps reside within a common coordinate system, and can be accumulated into a coherent, dense scene reconstruction that updates as new images arrive. Our model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views. Our method is simple yet highly flexible, naturally accepting varying lengths of images that may be either video streams or unordered photo collections, containing both static and dynamic content. We evaluate our method on various 3D/4D tasks and demonstrate competitive or state-of-the-art performance in each. Project Page: https://cut3r.github.io/

Summary

AI-Generated Summary

PDF42February 10, 2025