高解像度3D生成の向上におけるピクセル単位の勾配クリッピング

要旨

高解像度3Dオブジェクト生成は、主に包括的な注釈付きトレーニングデータの限られた可用性により、依然として困難な課題です。最近の進展では、Score Distillation Sampling（SDS）などの知識転移技術を用いて、広範にキュレーションされたウェブデータセットで事前学習された画像生成モデルを活用することで、この制約を克服しようとしています。高解像度レンダリングの要件を効率的に満たすためには、Latent Diffusion Model（LDM）などの潜在表現ベースのモデルを採用することがしばしば必要です。このフレームワークにおいて、重要な課題が生じます：個々の画像ピクセルの勾配を計算するためには、指定された潜在空間から画像モデルの凍結されたコンポーネント（LDM内で使用されるVAEエンコーダなど）を通じて勾配を逆伝播する必要があります。しかし、この勾配伝播経路は最適化されておらず、トレーニング中に制御されないままです。我々は、この制御されない勾配が、画像生成モデルからテクスチャ関連情報を取得する3Dモデルの能力に悪影響を及ぼし、品質の低い外観合成を引き起こすことを発見しました。この包括的な課題に対処するため、我々は既存の3D生成モデルにシームレスに統合可能なPixel-wise Gradient Clipping（PGC）という革新的な操作を提案し、それによって合成品質を向上させます。具体的には、確率的勾配の大きさを制御するために、ピクセル単位の勾配を効率的にクリッピングしつつ、重要なテクスチャ関連の勾配方向を保持します。このシンプルさと最小限の追加コストにもかかわらず、広範な実験により、既存の3D生成モデルの高解像度オブジェクトレンダリング性能を向上させるPGCの有効性が実証されています。

English

High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.

高解像度3D生成の向上におけるピクセル単位の勾配クリッピング

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

要旨

Support