將臉部互動圖網絡擴展至真實世界場景

摘要

準確模擬現實世界物體動態對於各種應用至關重要，例如機器人技術、工程、圖形學和設計。為了更好地捕捉複雜的現實動態，如接觸和摩擦，基於圖網絡的學習模擬器最近展現出巨大潛力。然而，將這些學習模擬器應用於真實場景面臨兩個主要挑戰：首先，將學習模擬器擴展到處理現實世界場景的複雜性，這可能涉及數百個物體，每個物體都具有復雜的3D形狀；其次，處理來自感知而不是3D狀態信息的輸入。在這裡，我們介紹了一種方法，顯著降低了運行基於圖網絡的學習模擬器所需的內存。基於這種內存高效的模擬模型，我們隨後提出了一種知覺界面，即可編輯的 NeRFs，它可以將現實世界場景轉換為結構化表示，以便圖網絡模擬器處理。我們展示了我們的方法使用的內存遠遠少於以前基於圖網絡的模擬器，同時保留了它們的準確性，並且在合成環境中學習的模擬器可以應用於從多個攝像機角度捕獲的真實世界場景。這為將學習模擬器的應用擴展到僅在推理時可用感知信息的設置打開了道路。

English

Accurately simulating real world object dynamics is essential for various applications such as robotics, engineering, graphics, and design. To better capture complex real dynamics such as contact and friction, learned simulators based on graph networks have recently shown great promise. However, applying these learned simulators to real scenes comes with two major challenges: first, scaling learned simulators to handle the complexity of real world scenes which can involve hundreds of objects each with complicated 3D shapes, and second, handling inputs from perception rather than 3D state information. Here we introduce a method which substantially reduces the memory required to run graph-based learned simulators. Based on this memory-efficient simulation model, we then present a perceptual interface in the form of editable NeRFs which can convert real-world scenes into a structured representation that can be processed by graph network simulator. We show that our method uses substantially less memory than previous graph-based simulators while retaining their accuracy, and that the simulators learned in synthetic environments can be applied to real world scenes captured from multiple camera angles. This paves the way for expanding the application of learned simulators to settings where only perceptual information is available at inference time.

將臉部互動圖網絡擴展至真實世界場景

Scaling Face Interaction Graph Networks to Real World Scenes

摘要

Support