通過像素級梯度截斷來增強高分辨率3D生成

摘要

高解析度的3D物體生成仍然是一項具有挑戰性的任務，主要是由於全面標註的訓練數據有限。最近的進展旨在通過利用在廣泛整理的網絡數據集上預訓練的圖像生成模型，並使用得分蒸餾取樣（SDS）等知識轉移技術來克服這一限制。有效地應對高解析度渲染的要求通常需要採用基於潛在表示的模型，例如潛在擴散模型（LDM）。在這個框架中，出現了一個重要挑戰：為了計算單個圖像像素的梯度，需要從指定的潛在空間通過凍結的圖像模型組件反向傳播梯度，例如在LDM中使用的VAE編碼器。然而，這種梯度傳播路徑從未被優化，訓練過程中一直是不受控制的。我們發現，這種不受控制的梯度對於3D模型從圖像生成模型中獲取與紋理相關的信息的能力產生不利影響，導致外觀合成的質量不佳。為了應對這一全面性挑戰，我們提出了一種名為像素級梯度截斷（PGC）的創新操作，旨在無縫集成到現有的3D生成模型中，從而提高它們的合成質量。具體來說，我們通過高效地截斷像素級梯度來控制隨機梯度的幅度，同時保留關鍵的與紋理相關的梯度方向。儘管這種方法簡單且額外成本很低，但廣泛的實驗證明了我們的PGC在提高現有3D生成模型的性能，用於高解析度物體渲染方面的有效性。

English

High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.

通過像素級梯度截斷來增強高分辨率3D生成

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

摘要

Support