ChatPaper.aiChatPaper

MesaTask:面向任务驱动的桌面场景生成——基于三维空间推理

MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning

September 26, 2025
作者: Jinkun Hao, Naifu Liang, Zhen Luo, Xudong Xu, Weipeng Zhong, Ran Yi, Yichen Jin, Zhaoyang Lyu, Feng Zheng, Lizhuang Ma, Jiangmiao Pang
cs.AI

摘要

机器人解析人类指令并执行操控任务的能力,依赖于可获得的任务相关桌面场景用于训练。然而,传统创建这些场景的方法依赖于耗时的手动布局设计或完全随机的布局,这些方法在场景的合理性或与任务的对齐方面存在局限。本文中,我们提出了一项新颖任务,即面向任务的桌面场景生成,由于高级任务指令与桌面场景之间存在显著差距,该任务带来了重大挑战。为支持这一具有挑战性的任务研究,我们引入了MesaTask-10K,一个包含约10,700个合成桌面场景的大规模数据集,这些场景采用手工设计的布局,确保了布局的真实性和物体间复杂关系的体现。为弥合任务与场景之间的鸿沟,我们提出了一种空间推理链,将生成过程分解为物体推断、空间相互关系推理以及最终3D布局的场景图构建。我们展示了MesaTask,一个基于大语言模型(LLM)的框架,该框架利用此推理链,并进一步通过DPO算法增强,以生成与给定任务描述高度契合且物理上合理的桌面场景。详尽的实验表明,MesaTask在生成符合任务要求且布局真实的桌面场景方面,相较于基线方法展现出卓越性能。项目页面位于https://mesatask.github.io/。
English
The ability of robots to interpret human instructions and execute manipulation tasks necessitates the availability of task-relevant tabletop scenes for training. However, traditional methods for creating these scenes rely on time-consuming manual layout design or purely randomized layouts, which are limited in terms of plausibility or alignment with the tasks. In this paper, we formulate a novel task, namely task-oriented tabletop scene generation, which poses significant challenges due to the substantial gap between high-level task instructions and the tabletop scenes. To support research on such a challenging task, we introduce MesaTask-10K, a large-scale dataset comprising approximately 10,700 synthetic tabletop scenes with manually crafted layouts that ensure realistic layouts and intricate inter-object relations. To bridge the gap between tasks and scenes, we propose a Spatial Reasoning Chain that decomposes the generation process into object inference, spatial interrelation reasoning, and scene graph construction for the final 3D layout. We present MesaTask, an LLM-based framework that utilizes this reasoning chain and is further enhanced with DPO algorithms to generate physically plausible tabletop scenes that align well with given task descriptions. Exhaustive experiments demonstrate the superior performance of MesaTask compared to baselines in generating task-conforming tabletop scenes with realistic layouts. Project page is at https://mesatask.github.io/
PDF283September 29, 2025