ZeroComp：通過擴散從圖像內在生成零樣本對象合成

摘要

我們提出了ZeroComp，一種有效的零樣本3D物體合成方法，在訓練過程中不需要配對的合成場景圖像。我們的方法利用ControlNet從內在圖像進行條件設置，並將其與穩定擴散模型結合，利用其場景先驗，共同作為有效的渲染引擎。在訓練過程中，ZeroComp使用基於幾何、反照率和遮罩陰影的內在圖像，而無需具有和不具有合成物體的場景配對圖像。一旦訓練完成，它可以無縫地將虛擬3D物體整合到場景中，調整陰影以創建逼真的合成效果。我們開發了一個高質量的評估數據集，並展示了ZeroComp在定量和人類感知基準中優於使用明確照明估計和生成技術的方法。此外，ZeroComp擴展到真實和室外圖像合成，即使僅在合成室內數據上進行訓練，也展示了其在圖像合成中的有效性。

English

We present ZeroComp, an effective zero-shot 3D object compositing approach that does not require paired composite-scene images during training. Our method leverages ControlNet to condition from intrinsic images and combines it with a Stable Diffusion model to utilize its scene priors, together operating as an effective rendering engine. During training, ZeroComp uses intrinsic images based on geometry, albedo, and masked shading, all without the need for paired images of scenes with and without composite objects. Once trained, it seamlessly integrates virtual 3D objects into scenes, adjusting shading to create realistic composites. We developed a high-quality evaluation dataset and demonstrate that ZeroComp outperforms methods using explicit lighting estimations and generative techniques in quantitative and human perception benchmarks. Additionally, ZeroComp extends to real and outdoor image compositing, even when trained solely on synthetic indoor data, showcasing its effectiveness in image compositing.

ZeroComp：通過擴散從圖像內在生成零樣本對象合成

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

摘要

Support