走向基于实证的社会推理

摘要

考虑一个机器人的任务是整理一个摆放着精心构建的乐高运动汽车的书桌。人类可能会意识到将运动汽车拆开并放好并不符合社交礼仪。机器人如何能够得出这样的结论呢？尽管最近大型语言模型（LLMs）已被用于实现社交推理，但在现实世界中进行这种推理仍然具有挑战性。为了在现实世界中进行推理，机器人必须超越被动地向LLMs查询，而是*积极地从环境中收集信息*，以便做出正确的决定。例如，在检测到有一辆被遮挡的汽车后，机器人可能需要积极地感知这辆汽车，以了解它是由乐高制成的高级型号汽车，还是由幼儿制作的玩具车。我们提出了一种方法，利用LLM和视觉语言模型（VLM）帮助机器人积极感知其环境，从而进行基于实际的社交推理。为了在规模上评估我们的框架，我们发布了MessySurfaces数据集，其中包含70个需要清理的现实世界表面的图像。我们还通过一个机器人在两个精心设计的表面上展示了我们的方法。我们发现在MessySurfaces基准测试中平均提高了12.9%，在不使用主动感知的基线上，机器人实验平均提高了15%。我们的数据集、代码和方法的视频可以在https://minaek.github.io/groundedsocialreasoning 找到。

English

Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not socially appropriate to disassemble the sports car and put it away as part of the "tidying". How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable social reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and *actively gather information from the environment* that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded social reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/groundedsocialreasoning.

走向基于实证的社会推理

Toward Grounded Social Reasoning

摘要

Support