PlaceIt3D:基于语言指导的真实3D场景物体布局
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
May 8, 2025
作者: Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando
cs.AI
摘要
我们提出了“语言引导下的真实3D场景物体放置”这一新颖任务。我们的模型接收一个3D场景的点云数据、一个3D资产以及一段大致描述该3D资产应放置位置的文本提示。此任务的核心在于寻找一个既符合提示又有效的3D资产放置位置。相较于3D场景中的其他语言引导定位任务(如接地任务),本任务面临特定挑战:其具有多解性,即存在多个有效解决方案,并且需要推理3D几何关系及空闲空间。我们通过提出新的基准和评估协议,正式开启了这一任务的研究。同时,我们引入了一个用于训练3D大语言模型(LLMs)的新数据集,以及首个作为非平凡基线的方法。我们相信,这一具有挑战性的任务及其新基准,有望成为评估和比较通用型3D大语言模型性能的基准测试套件之一。
English
We introduce the novel task of Language-Guided Object Placement in Real 3D
Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual
prompt broadly describing where the 3D asset should be placed. The task here is
to find a valid placement for the 3D asset that respects the prompt. Compared
with other language-guided localization tasks in 3D scenes such as grounding,
this task has specific challenges: it is ambiguous because it has multiple
valid solutions, and it requires reasoning about 3D geometric relationships and
free space. We inaugurate this task by proposing a new benchmark and evaluation
protocol. We also introduce a new dataset for training 3D LLMs on this task, as
well as the first method to serve as a non-trivial baseline. We believe that
this challenging task and our new benchmark could become part of the suite of
benchmarks used to evaluate and compare generalist 3D LLM models.Summary
AI-Generated Summary