ZeroComp：通过扩散从图像固有特性实现零镜头物体合成

摘要

我们提出了ZeroComp，这是一种有效的零样本3D物体合成方法，在训练过程中不需要配对的合成场景图像。我们的方法利用ControlNet从内在图像进行条件化，并将其与稳定扩散模型相结合，利用其场景先验知识，共同作为一个有效的渲染引擎。在训练过程中，ZeroComp使用基于几何、反照率和遮罩阴影的内在图像，而无需具有和不具有复合物体的场景的配对图像。一旦训练完成，它可以无缝地将虚拟3D物体整合到场景中，调整阴影以创建逼真的合成效果。我们开发了一个高质量的评估数据集，并证明ZeroComp在定量和人类感知基准测试中优于使用显式光照估计和生成技术的方法。此外，ZeroComp可以扩展到真实和室外图像合成，即使仅在合成室内数据上进行训练，也展示了它在图像合成中的有效性。

English

We present ZeroComp, an effective zero-shot 3D object compositing approach that does not require paired composite-scene images during training. Our method leverages ControlNet to condition from intrinsic images and combines it with a Stable Diffusion model to utilize its scene priors, together operating as an effective rendering engine. During training, ZeroComp uses intrinsic images based on geometry, albedo, and masked shading, all without the need for paired images of scenes with and without composite objects. Once trained, it seamlessly integrates virtual 3D objects into scenes, adjusting shading to create realistic composites. We developed a high-quality evaluation dataset and demonstrate that ZeroComp outperforms methods using explicit lighting estimations and generative techniques in quantitative and human perception benchmarks. Additionally, ZeroComp extends to real and outdoor image compositing, even when trained solely on synthetic indoor data, showcasing its effectiveness in image compositing.

ZeroComp：通过扩散从图像固有特性实现零镜头物体合成

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

摘要

Support