DiLightNet: 확산 기반 이미지 생성을 위한 세밀한 조명 제어

초록

본 논문은 텍스트 기반 확산 모델(diffusion model)을 이용한 이미지 생성 과정에서 세밀한 조명 제어를 가능하게 하는 새로운 방법을 제시합니다. 기존의 확산 모델은 어떤 조명 조건에서도 이미지를 생성할 수 있는 능력을 갖추고 있지만, 추가적인 지침 없이는 이미지 콘텐츠와 조명이 상호 연관되는 경향이 있습니다. 또한, 텍스트 프롬프트는 세부적인 조명 설정을 설명하기에 충분한 표현력을 갖추지 못합니다. 이를 해결하기 위해, 본 연구에서는 이미지 생성 과정에서 콘텐츠 제작자가 조명을 세밀하게 제어할 수 있도록, 조명 정보를 방사광 힌트(radiance hints) 형태로 텍스트 프롬프트에 추가합니다. 방사광 힌트는 목표 조명 하에서 균일한 표준 재질을 가진 장면 기하학의 시각화를 의미합니다. 그러나 방사광 힌트를 생성하기 위해 필요한 장면 기하학은 알려져 있지 않습니다. 여기서 우리의 핵심 관찰은 정확한 방사광 힌트가 필요하지 않으며, 확산 과정을 올바른 방향으로 유도하기만 하면 된다는 것입니다. 이러한 관찰을 바탕으로, 우리는 이미지 생성 중 조명을 제어하기 위한 세 단계 방법을 제안합니다. 첫 번째 단계에서는, 미리 학습된 표준 확산 모델을 활용하여 제어되지 않은 조명 하에서 임시 이미지를 생성합니다. 다음으로, 두 번째 단계에서는 임시 이미지에서 추론된 전경 객체의 대략적인 형태를 기반으로 계산된 방사광 힌트를 사용하여, DiLightNet이라는 개선된 확산 모델에 목표 조명을 전달함으로써 생성된 이미지의 전경 객체를 재합성하고 정제합니다. 텍스처 세부 정보를 유지하기 위해, DiLightNet에 전달하기 전에 방사광 힌트를 임시 합성 이미지의 신경망 인코딩과 곱합니다. 마지막으로, 세 번째 단계에서는 전경 객체의 조명과 일관성을 유지하도록 배경을 재합성합니다. 우리는 다양한 텍스트 프롬프트와 조명 조건에서 조명 제어 확산 모델을 시연하고 검증합니다.

English

This paper presents a novel method for exerting fine-grained lighting control during text-driven diffusion-based image generation. While existing diffusion models already have the ability to generate images under any lighting condition, without additional guidance these models tend to correlate image content and lighting. Moreover, text prompts lack the necessary expressional power to describe detailed lighting setups. To provide the content creator with fine-grained control over the lighting during image generation, we augment the text-prompt with detailed lighting information in the form of radiance hints, i.e., visualizations of the scene geometry with a homogeneous canonical material under the target lighting. However, the scene geometry needed to produce the radiance hints is unknown. Our key observation is that we only need to guide the diffusion process, hence exact radiance hints are not necessary; we only need to point the diffusion model in the right direction. Based on this observation, we introduce a three stage method for controlling the lighting during image generation. In the first stage, we leverage a standard pretrained diffusion model to generate a provisional image under uncontrolled lighting. Next, in the second stage, we resynthesize and refine the foreground object in the generated image by passing the target lighting to a refined diffusion model, named DiLightNet, using radiance hints computed on a coarse shape of the foreground object inferred from the provisional image. To retain the texture details, we multiply the radiance hints with a neural encoding of the provisional synthesized image before passing it to DiLightNet. Finally, in the third stage, we resynthesize the background to be consistent with the lighting on the foreground object. We demonstrate and validate our lighting controlled diffusion model on a variety of text prompts and lighting conditions.

DiLightNet: 확산 기반 이미지 생성을 위한 세밀한 조명 제어

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

초록

Support