AmbiK: キッチン環境における曖昧なタスクのデータセット

要旨

大規模言語モデル（LLMs）は、エンボディドエージェントの一部として、ユーザーからの自然言語指示に基づく行動計画に典型的に使用される。しかし、現実世界の環境における曖昧な指示の処理は、LLMsにとって依然として課題である。タスクの曖昧性検出のための様々な手法が提案されているが、それらは異なるデータセットでテストされており、普遍的なベンチマークがないため、比較が困難である。このため、我々はキッチン環境におけるロボットへの曖昧な指示を扱う完全なテキストデータセットであるAmbiK（Ambiguous Tasks in Kitchen Environment）を提案する。AmbiKはLLMsの支援により収集され、人間による検証が行われている。このデータセットは、曖昧なタスクとその明確な対応タスクの1000ペアを含み、曖昧性のタイプ（人間の選好、常識的知識、安全性）によって分類されている。さらに、環境の説明、明確化のための質問と回答、ユーザーの意図、およびタスク計画が含まれており、合計2000のタスクが収録されている。AmbiKが研究者たちに曖昧性検出手法の統一的な比較を可能にすることを期待している。AmbiKはhttps://github.com/cog-model/AmbiK-datasetで公開されている。

English

As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.

AmbiK: キッチン環境における曖昧なタスクのデータセット

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

要旨

Support