ChatPaper.aiChatPaper

上架、堆叠、悬挂:多模态重新排列的关系姿势扩散

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

July 10, 2023
作者: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
cs.AI

摘要

我们提出了一个系统,用于重新排列场景中的物体,以实现所需的物体-场景放置关系,例如将一本书插入书架的开放槽中。该流程适用于新颖的几何形状、姿势以及场景和物体的布局,并且是通过示范训练直接在3D点云上运行的。我们的系统克服了与给定场景存在许多几何上相似的重新排列解决方案相关的挑战。通过利用迭代姿势去噪训练过程,我们可以拟合多模态示范数据并产生多模态输出,同时保持精确和准确。我们还展示了在忽略损害泛化和精度的无关全局结构的同时,通过对相关局部几何特征进行调节的优势。我们在模拟和真实世界中展示了我们的方法,涉及需要处理多模态和在物体形状和姿势上的泛化的三个不同重新排列任务。项目网站、代码和视频:https://anthonysimeonov.github.io/rpdiff-multi-modal/
English
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
PDF40December 15, 2024