通过像素级梯度裁剪来增强高分辨率3D生成
Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
October 19, 2023
作者: Zijie Pan, Jiachen Lu, Xiatian Zhu, Li Zhang
cs.AI
摘要
由于全面注释的训练数据有限,高分辨率3D物体生成仍然是一项具有挑战性的任务。最近的进展旨在通过利用图像生成模型,这些模型在广泛筛选的网络数据集上预训练,并使用诸如得分蒸馏采样(SDS)之类的知识转移技术,来克服这一限制。有效地满足高分辨率渲染的要求通常需要采用基于潜在表示的模型,例如潜在扩散模型(LDM)。在这个框架中,存在一个重要挑战:为了计算单个图像像素的梯度,需要从指定的潜在空间通过LDM内部使用的VAE编码器等图像模型的冻结组件反向传播梯度。然而,这种梯度传播路径从未被优化,训练过程中一直是不受控制的。我们发现,未受调节的梯度会对3D模型从图像生成模型中获取与纹理相关信息的能力产生不利影响,导致外观合成质量较差。为了解决这一全面性挑战,我们提出了一种名为像素梯度剪切(PGC)的创新操作,旨在无缝集成到现有的3D生成模型中,从而提高它们的合成质量。具体而言,我们通过高效地剪切像素梯度来控制随机梯度的幅度,同时保留关键的与纹理相关的梯度方向。尽管这种方法简单且额外成本很小,但广泛的实验证明了我们的PGC在提升现有3D生成模型的性能,用于高分辨率物体渲染方面的有效性。
English
High-resolution 3D object generation remains a challenging task primarily due
to the limited availability of comprehensive annotated training data. Recent
advancements have aimed to overcome this constraint by harnessing image
generative models, pretrained on extensive curated web datasets, using
knowledge transfer techniques like Score Distillation Sampling (SDS).
Efficiently addressing the requirements of high-resolution rendering often
necessitates the adoption of latent representation-based models, such as the
Latent Diffusion Model (LDM). In this framework, a significant challenge
arises: To compute gradients for individual image pixels, it is necessary to
backpropagate gradients from the designated latent space through the frozen
components of the image model, such as the VAE encoder used within LDM.
However, this gradient propagation pathway has never been optimized, remaining
uncontrolled during training. We find that the unregulated gradients adversely
affect the 3D model's capacity in acquiring texture-related information from
the image generative model, leading to poor quality appearance synthesis. To
address this overarching challenge, we propose an innovative operation termed
Pixel-wise Gradient Clipping (PGC) designed for seamless integration into
existing 3D generative models, thereby enhancing their synthesis quality.
Specifically, we control the magnitude of stochastic gradients by clipping the
pixel-wise gradients efficiently, while preserving crucial texture-related
gradient directions. Despite this simplicity and minimal extra cost, extensive
experiments demonstrate the efficacy of our PGC in enhancing the performance of
existing 3D generative models for high-resolution object rendering.