PlaceIt3D:語言引導的物體放置於真實3D場景中
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
May 8, 2025
作者: Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando
cs.AI
摘要
我們提出了「語言引導的物體放置於真實3D場景」這一新穎任務。我們的模型接收一個3D場景的點雲數據、一個3D資產,以及一個大致描述3D資產應放置位置的文本提示。此任務的核心在於找到一個既符合提示又有效的3D資產放置位置。與其他在3D場景中基於語言的定位任務(如接地)相比,該任務面臨特定挑戰:其具有多解性,因為存在多個有效解決方案,並且需要對3D幾何關係和自由空間進行推理。我們通過提出新的基準和評估協議來啟動這一任務。此外,我們還引入了一個新的數據集,用於訓練在此任務上的3D大語言模型,以及作為非平凡基線的第一種方法。我們相信,這一具有挑戰性的任務及我們的新基準,有望成為評估和比較通用型3D大語言模型的一系列基準測試中的一部分。
English
We introduce the novel task of Language-Guided Object Placement in Real 3D
Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual
prompt broadly describing where the 3D asset should be placed. The task here is
to find a valid placement for the 3D asset that respects the prompt. Compared
with other language-guided localization tasks in 3D scenes such as grounding,
this task has specific challenges: it is ambiguous because it has multiple
valid solutions, and it requires reasoning about 3D geometric relationships and
free space. We inaugurate this task by proposing a new benchmark and evaluation
protocol. We also introduce a new dataset for training 3D LLMs on this task, as
well as the first method to serve as a non-trivial baseline. We believe that
this challenging task and our new benchmark could become part of the suite of
benchmarks used to evaluate and compare generalist 3D LLM models.Summary
AI-Generated Summary