ZeroComp：画像固有の情報を介したゼロショットオブジェクト合成による拡散

要旨

ZeroCompは、トレーニング中に対になった合成シーン画像を必要としない効果的なゼロショット3Dオブジェクト合成アプローチを提案します。当手法は、内在画像からの条件付けにControlNetを活用し、Stable Diffusionモデルと組み合わせて、シーン事前情報を利用することで、効果的なレンダリングエンジンとして機能します。トレーニング中、ZeroCompは、ジオメトリ、アルベド、およびマスク処理されたシェーディングに基づく内在画像を使用し、合成オブジェクトのあるシーンとないシーンの対になった画像を必要としません。トレーニングが完了すると、リアルな合成物を作成するためにシェーディングを調整しながら、仮想3Dオブジェクトをシーンにシームレスに統合します。高品質な評価データセットを開発し、ZeroCompが定量的および人間の知覚基準で、明示的な照明推定や生成技術を使用する手法を上回ることを示しています。さらに、ZeroCompは、合成室内データのみを使用してトレーニングされている場合でも、リアルおよび屋外画像合成に拡張され、画像合成においてその効果を示しています。

English

We present ZeroComp, an effective zero-shot 3D object compositing approach that does not require paired composite-scene images during training. Our method leverages ControlNet to condition from intrinsic images and combines it with a Stable Diffusion model to utilize its scene priors, together operating as an effective rendering engine. During training, ZeroComp uses intrinsic images based on geometry, albedo, and masked shading, all without the need for paired images of scenes with and without composite objects. Once trained, it seamlessly integrates virtual 3D objects into scenes, adjusting shading to create realistic composites. We developed a high-quality evaluation dataset and demonstrate that ZeroComp outperforms methods using explicit lighting estimations and generative techniques in quantitative and human perception benchmarks. Additionally, ZeroComp extends to real and outdoor image compositing, even when trained solely on synthetic indoor data, showcasing its effectiveness in image compositing.

ZeroComp：画像固有の情報を介したゼロショットオブジェクト合成による拡散

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

要旨

Support