実世界シーンへの顔インタラクショングラフネットワークのスケーリング

要旨

現実世界の物体のダイナミクスを正確にシミュレートすることは、ロボティクス、エンジニアリング、グラフィックス、デザインなど様々な応用において不可欠です。接触や摩擦といった複雑な現実のダイナミクスをより良く捉えるために、グラフネットワークに基づく学習型シミュレータが最近大きな注目を集めています。しかし、これらの学習型シミュレータを現実のシーンに適用するには、2つの大きな課題があります。第一に、現実世界のシーンの複雑さ（数百の物体がそれぞれ複雑な3D形状を持つ場合など）に対応するために学習型シミュレータをスケーリングすること、第二に、3D状態情報ではなく知覚からの入力を処理することです。本論文では、グラフベースの学習型シミュレータを実行するために必要なメモリを大幅に削減する手法を紹介します。このメモリ効率の良いシミュレーションモデルに基づいて、編集可能なNeRF（Neural Radiance Fields）の形で知覚インターフェースを提示し、現実世界のシーンをグラフネットワークシミュレータが処理可能な構造化表現に変換します。我々の手法は、従来のグラフベースシミュレータと比較して大幅に少ないメモリ使用量でありながら精度を維持し、合成環境で学習したシミュレータを複数のカメラ角度から撮影した現実世界のシーンに適用できることを示します。これにより、推論時に知覚情報のみが利用可能な設定においても、学習型シミュレータの応用範囲を拡大する道が開かれます。

English

Accurately simulating real world object dynamics is essential for various applications such as robotics, engineering, graphics, and design. To better capture complex real dynamics such as contact and friction, learned simulators based on graph networks have recently shown great promise. However, applying these learned simulators to real scenes comes with two major challenges: first, scaling learned simulators to handle the complexity of real world scenes which can involve hundreds of objects each with complicated 3D shapes, and second, handling inputs from perception rather than 3D state information. Here we introduce a method which substantially reduces the memory required to run graph-based learned simulators. Based on this memory-efficient simulation model, we then present a perceptual interface in the form of editable NeRFs which can convert real-world scenes into a structured representation that can be processed by graph network simulator. We show that our method uses substantially less memory than previous graph-based simulators while retaining their accuracy, and that the simulators learned in synthetic environments can be applied to real world scenes captured from multiple camera angles. This paves the way for expanding the application of learned simulators to settings where only perceptual information is available at inference time.

実世界シーンへの顔インタラクショングラフネットワークのスケーリング

Scaling Face Interaction Graph Networks to Real World Scenes

要旨

Support