ChatPaper.aiChatPaper

PlaceIt3D:語言引導的物體放置於真實3D場景中

PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes

May 8, 2025
作者: Ahmed Abdelreheem, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Abdelrahman Eldesokey, Peter Wonka, Gabriel Brostow, Sara Vicente, Guillermo Garcia-Hernando
cs.AI

摘要

我們提出了「語言引導的物體放置於真實3D場景」這一新穎任務。我們的模型接收一個3D場景的點雲數據、一個3D資產,以及一個大致描述3D資產應放置位置的文本提示。此任務的核心在於找到一個既符合提示又有效的3D資產放置位置。與其他在3D場景中基於語言的定位任務(如接地)相比,該任務面臨特定挑戰:其具有多解性,因為存在多個有效解決方案,並且需要對3D幾何關係和自由空間進行推理。我們通過提出新的基準和評估協議來啟動這一任務。此外,我們還引入了一個新的數據集,用於訓練在此任務上的3D大語言模型,以及作為非平凡基線的第一種方法。我們相信,這一具有挑戰性的任務及我們的新基準,有望成為評估和比較通用型3D大語言模型的一系列基準測試中的一部分。
English
We introduce the novel task of Language-Guided Object Placement in Real 3D Scenes. Our model is given a 3D scene's point cloud, a 3D asset, and a textual prompt broadly describing where the 3D asset should be placed. The task here is to find a valid placement for the 3D asset that respects the prompt. Compared with other language-guided localization tasks in 3D scenes such as grounding, this task has specific challenges: it is ambiguous because it has multiple valid solutions, and it requires reasoning about 3D geometric relationships and free space. We inaugurate this task by proposing a new benchmark and evaluation protocol. We also introduce a new dataset for training 3D LLMs on this task, as well as the first method to serve as a non-trivial baseline. We believe that this challenging task and our new benchmark could become part of the suite of benchmarks used to evaluate and compare generalist 3D LLM models.

Summary

AI-Generated Summary

PDF51May 9, 2025