LDM3D：用於3D的潛在擴散模型

摘要

本研究提出了一種用於三維的潛在擴散模型（LDM3D），可以從給定的文本提示生成圖像和深度圖數據，使用戶能夠從文本提示生成RGBD圖像。LDM3D模型在包含RGB圖像、深度圖和標題的元組數據集上進行了微調，並通過大量實驗進行了驗證。我們還開發了一個名為DepthFusion的應用程序，該應用程序使用生成的RGB圖像和深度圖來創建沉浸式和交互式的360度視圖體驗，並使用TouchDesigner。這項技術有潛力改變廣泛的行業，從娛樂和遊戲到建築和設計。總的來說，本文對生成式人工智能和計算機視覺領域做出了重要貢獻，展示了LDM3D和DepthFusion改變內容創作和數字體驗的潛力。可以在https://t.ly/tdi2找到總結該方法的短視頻。

English

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts. The LDM3D model is fine-tuned on a dataset of tuples containing an RGB image, depth map and caption, and validated through extensive experiments. We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360-degree-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design. Overall, this paper presents a significant contribution to the field of generative AI and computer vision, and showcases the potential of LDM3D and DepthFusion to revolutionize content creation and digital experiences. A short video summarizing the approach can be found at https://t.ly/tdi2.

LDM3D：用於3D的潛在擴散模型

LDM3D: Latent Diffusion Model for 3D

摘要

Support