픽셀 단위 그래디언트 클리핑을 통한 고해상도 3D 생성 향상

초록

고해상도 3D 객체 생성은 주로 포괄적인 주석이 달린 훈련 데이터의 제한된 가용성으로 인해 여전히 어려운 과제로 남아 있습니다. 최근의 발전은 Score Distillation Sampling (SDS)과 같은 지식 전이 기술을 활용하여 광범위하게 큐레이팅된 웹 데이터셋에서 사전 훈련된 이미지 생성 모델을 이용함으로써 이러한 제약을 극복하려는 시도가 이루어졌습니다. 고해상도 렌더링의 요구 사항을 효율적으로 해결하기 위해서는 Latent Diffusion Model (LDM)과 같은 잠재 표현 기반 모델의 채택이 종종 필요합니다. 이 프레임워크에서 중요한 과제가 발생합니다: 개별 이미지 픽셀에 대한 그래디언트를 계산하기 위해서는 지정된 잠재 공간에서 이미지 모델의 고정된 구성 요소(예: LDM 내에서 사용되는 VAE 인코더)를 통해 그래디언트를 역전파해야 합니다. 그러나 이 그래디언트 전파 경로는 최적화된 적이 없으며, 훈련 중에 제어되지 않은 상태로 남아 있습니다. 우리는 이 제어되지 않은 그래디언트가 3D 모델의 이미지 생성 모델로부터 텍스처 관련 정보를 획득하는 능력에 부정적인 영향을 미쳐 낮은 품질의 외관 합성을 초래한다는 것을 발견했습니다. 이러한 전반적인 과제를 해결하기 위해, 우리는 기존 3D 생성 모델에 원활하게 통합될 수 있는 Pixel-wise Gradient Clipping (PGC)이라는 혁신적인 연산을 제안하여 합성 품질을 향상시킵니다. 구체적으로, 우리는 중요한 텍스처 관련 그래디언트 방향을 보존하면서 픽셀 단위 그래디언트를 효율적으로 클리핑함으로써 확률적 그래디언트의 크기를 제어합니다. 이러한 단순성과 최소한의 추가 비용에도 불구하고, 광범위한 실험을 통해 우리의 PGC가 고해상도 객체 렌더링을 위한 기존 3D 생성 모델의 성능을 향상시키는 데 효과적임을 입증했습니다.

English

High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.

픽셀 단위 그래디언트 클리핑을 통한 고해상도 3D 생성 향상

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

초록

Support