假設:通過稀疏交互理解運動
What If : Understanding Motion Through Sparse Interactions
October 14, 2025
作者: Stefan Andreas Baumann, Nick Stracke, Timy Phan, Björn Ommer
cs.AI
摘要
理解物理場景的動態涉及對其潛在多樣變化的推理,尤其是由局部交互作用所引發的變化。本文提出了一種新穎的框架——流動觸碰變換器(Flow Poke Transformer, FPT),用於直接預測局部運動的分佈,條件基於被稱為“觸碰”的稀疏交互。與傳統方法通常僅能對場景動態進行密集採樣以獲得單一實現不同,FPT提供了一種可解釋且直接可訪問的多模態場景運動表示,包括其對物理交互的依賴性以及場景動態固有的不確定性。我們還對模型在多個下游任務上的表現進行了評估,以便與先前方法進行比較,並突顯我們方法的靈活性。在密集面部運動生成任務中,我們通用的預訓練模型超越了專門的基線。FPT能夠在強烈分佈外任務(如合成數據集)上進行微調,從而在關節物體運動估計方面實現相較於域內方法的顯著提升。此外,直接預測顯式運動分佈使我們的方法在如基於觸碰的移動部件分割等任務上達到了競爭力的性能,進一步展示了FPT的多功能性。代碼和模型已公開於https://compvis.github.io/flow-poke-transformer。
English
Understanding the dynamics of a physical scene involves reasoning about the
diverse ways it can potentially change, especially as a result of local
interactions. We present the Flow Poke Transformer (FPT), a novel framework for
directly predicting the distribution of local motion, conditioned on sparse
interactions termed "pokes". Unlike traditional methods that typically only
enable dense sampling of a single realization of scene dynamics, FPT provides
an interpretable directly accessible representation of multi-modal scene
motion, its dependency on physical interactions and the inherent uncertainties
of scene dynamics. We also evaluate our model on several downstream tasks to
enable comparisons with prior methods and highlight the flexibility of our
approach. On dense face motion generation, our generic pre-trained model
surpasses specialized baselines. FPT can be fine-tuned in strongly
out-of-distribution tasks such as synthetic datasets to enable significant
improvements over in-domain methods in articulated object motion estimation.
Additionally, predicting explicit motion distributions directly enables our
method to achieve competitive performance on tasks like moving part
segmentation from pokes which further demonstrates the versatility of our FPT.
Code and models are publicly available at
https://compvis.github.io/flow-poke-transformer.