ChatPaper.aiChatPaper

AmbiK:厨房环境下的模糊任务数据集

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

June 4, 2025
作者: Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov
cs.AI

摘要

作为具身智能体的一部分,大型语言模型(LLMs)通常用于根据用户的自然语言指令进行行为规划。然而,在现实环境中处理模糊指令仍然是LLMs面临的一大挑战。尽管已有多种任务模糊性检测方法被提出,但由于它们在不同数据集上进行测试,且缺乏统一的基准,难以进行有效比较。为此,我们提出了AmbiK(厨房环境中的模糊任务),这是一个完全基于文本的数据集,包含了针对厨房环境中机器人的模糊指令。AmbiK在LLMs的协助下收集,并经过人工验证。该数据集包含1000对模糊任务及其明确对应版本,按模糊类型(人类偏好、常识知识、安全性)分类,并附有环境描述、澄清问题与答案、用户意图以及任务计划,总计2000项任务。我们期望AmbiK能够帮助研究人员对模糊性检测方法进行统一比较。AmbiK数据集可通过https://github.com/cog-model/AmbiK-dataset 获取。
English
As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.
PDF432June 5, 2025