全息甲板:語言引導生成3D具體化人工智慧環境
Holodeck: Language Guided Generation of 3D Embodied AI Environments
December 14, 2023
作者: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark
cs.AI
摘要
3D模擬環境在具身體性人工智慧中扮演關鍵角色,但其創建需要專業知識和大量手動工作,限制了其多樣性和範圍。為了解決這一限制,我們提出了Holodeck,一個系統,可以完全自動生成符合用戶提供提示的3D環境。Holodeck能夠生成多樣化的場景,例如遊樂場、溫泉和博物館,調整設計風格,並捕捉複雜查詢的語義,例如“一個有貓的研究員的公寓”和“喜歡星際大戰的教授的辦公室”。Holodeck利用大型語言模型(GPT-4)對場景可能的外觀具有常識知識,並使用Objaverse的大量3D資產來填充場景中的各種物件。為了應對正確定位物件的挑戰,我們提示GPT-4生成物件之間的空間關係約束,然後優化佈局以滿足這些約束。我們的大規模人類評估顯示,標註者更喜歡Holodeck而不是手動設計的程序化基準在住宅場景中,並且Holodeck能夠為多種場景類型生成高質量輸出。我們還展示了Holodeck在具身體性人工智慧中的一個令人興奮的應用,即訓練代理在像音樂室和托兒所這樣的新場景中導航,而無需人工構建的數據,這是發展通用具身體性代理的重要一步。
English
3D simulated environments play a critical role in Embodied AI, but their
creation requires expertise and extensive manual effort, restricting their
diversity and scope. To mitigate this limitation, we present Holodeck, a system
that generates 3D environments to match a user-supplied prompt fully
automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and
museums, adjust the designs for styles, and can capture the semantics of
complex queries such as "apartment for a researcher with a cat" and "office of
a professor who is a fan of Star Wars". Holodeck leverages a large language
model (GPT-4) for common sense knowledge about what the scene might look like
and uses a large collection of 3D assets from Objaverse to populate the scene
with diverse objects. To address the challenge of positioning objects
correctly, we prompt GPT-4 to generate spatial relational constraints between
objects and then optimize the layout to satisfy those constraints. Our
large-scale human evaluation shows that annotators prefer Holodeck over
manually designed procedural baselines in residential scenes and that Holodeck
can produce high-quality outputs for diverse scene types. We also demonstrate
an exciting application of Holodeck in Embodied AI, training agents to navigate
in novel scenes like music rooms and daycares without human-constructed data,
which is a significant step forward in developing general-purpose embodied
agents.