DSO：通過模擬反饋對齊3D生成器以實現物理合理性

摘要

大多數3D物體生成器專注於美學品質，往往忽視了應用中必要的物理約束。其中一個約束是3D物體應具備自支撐能力，即在重力作用下保持平衡。先前生成穩定3D物體的方法使用可微物理模擬器在測試時優化幾何形狀，這種方法速度慢、不穩定且容易陷入局部最優。受對齊生成模型與外部反饋的文獻啟發，我們提出了直接模擬優化（DSO）框架，利用（不可微）模擬器的反饋來提高3D生成器直接輸出穩定3D物體的可能性。我們構建了一個包含從物理模擬器獲得的穩定性評分的3D物體數據集。然後，我們可以使用穩定性評分作為對齊指標，通過直接偏好優化（DPO）或我們引入的新目標——直接獎勵優化（DRO），來微調3D生成器，無需成對偏好即可對齊擴散模型。實驗表明，使用DPO或DRO目標微調的前饋生成器，比測試時優化更快且更有可能生成穩定物體。值得注意的是，DSO框架即使在沒有任何用於訓練的真實3D物體的情況下也能工作，允許3D生成器通過自動收集其自身輸出的模擬反饋來自我改進。

English

Most 3D object generators focus on aesthetic quality, often neglecting physical constraints necessary in applications. One such constraint is that the 3D object should be self-supporting, i.e., remains balanced under gravity. Prior approaches to generating stable 3D objects used differentiable physics simulators to optimize geometry at test-time, which is slow, unstable, and prone to local optima. Inspired by the literature on aligning generative models to external feedback, we propose Direct Simulation Optimization (DSO), a framework to use the feedback from a (non-differentiable) simulator to increase the likelihood that the 3D generator outputs stable 3D objects directly. We construct a dataset of 3D objects labeled with a stability score obtained from the physics simulator. We can then fine-tune the 3D generator using the stability score as the alignment metric, via direct preference optimization (DPO) or direct reward optimization (DRO), a novel objective, which we introduce, to align diffusion models without requiring pairwise preferences. Our experiments show that the fine-tuned feed-forward generator, using either DPO or DRO objective, is much faster and more likely to produce stable objects than test-time optimization. Notably, the DSO framework works even without any ground-truth 3D objects for training, allowing the 3D generator to self-improve by automatically collecting simulation feedback on its own outputs.

DSO：通過模擬反饋對齊3D生成器以實現物理合理性

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

摘要

Support