LDM3D-VR：3D VRのための潜在拡散モデル

要旨

潜在拡散モデルは、視覚的出力の生成と操作において最先端の技術であることが証明されています。しかし、私たちの知る限り、RGBと併せて深度マップを生成する技術はまだ限られています。本論文では、LDM3D-panoとLDM3D-SRを含む、仮想現実開発をターゲットとした拡散モデル群であるLDM3D-VRを紹介します。これらのモデルは、それぞれテキストプロンプトに基づくパノラマRGBDの生成と、低解像度入力から高解像度RGBDへのアップスケーリングを可能にします。私たちのモデルは、パノラマ/高解像度RGB画像、深度マップ、キャプションを含むデータセットで既存の事前学習済みモデルからファインチューニングされています。両モデルは、既存の関連手法と比較して評価されています。

English

Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.

LDM3D-VR：3D VRのための潜在拡散モデル

LDM3D-VR: Latent Diffusion Model for 3D VR

要旨

Support