ChatPaper.aiChatPaper

AmbiK:廚房環境中的模糊任務數據集

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

June 4, 2025
作者: Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov
cs.AI

摘要

作為具身智能體的一部分,大型語言模型(LLMs)通常用於根據用戶的自然語言指令進行行為規劃。然而,在現實環境中處理模糊指令仍然是LLMs面臨的一大挑戰。目前已有各種任務模糊性檢測方法被提出,但由於這些方法在不同數據集上進行測試,且缺乏統一的基準,因此難以進行比較。為此,我們提出了AmbiK(廚房環境中的模糊任務),這是一個完全基於文本的數據集,專門針對廚房環境中機器人接收到的模糊指令。AmbiK在LLMs的協助下收集並經過人工驗證,包含1000對模糊任務及其明確對應任務,按模糊類型(人類偏好、常識知識、安全性)分類,並附有環境描述、澄清問題與答案、用戶意圖和任務計劃,總計2000個任務。我們希望AmbiK能幫助研究人員對模糊性檢測方法進行統一比較。AmbiK數據集可在https://github.com/cog-model/AmbiK-dataset 獲取。
English
As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.
PDF442June 5, 2025