AmbiK: 주방 환경에서의 모호한 작업 데이터셋

초록

구현된 에이전트의 일부로서, 대형 언어 모델(LLMs)은 일반적으로 사용자로부터의 자연어 지시에 따른 행동 계획을 위해 사용된다. 그러나 현실 세계 환경에서 모호한 지시를 처리하는 것은 LLMs에게 여전히 도전 과제로 남아 있다. 작업 모호성 탐지를 위한 다양한 방법들이 제안되어 왔지만, 이들은 서로 다른 데이터셋에서 테스트되며 보편적인 벤치마크가 없기 때문에 비교하기가 어렵다. 이러한 이유로, 우리는 주방 환경에서 로봇에게 주어진 모호한 지시의 완전한 텍스트 데이터셋인 AmbiK(Ambiguous Tasks in Kitchen Environment)를 제안한다. AmbiK는 LLMs의 도움을 받아 수집되었으며 인간 검증을 거쳤다. 이 데이터셋은 모호성 유형(인간 선호도, 상식 지식, 안전)에 따라 분류된 1000쌍의 모호한 작업과 그에 대응하는 명확한 작업으로 구성되며, 환경 설명, 명확화 질문과 답변, 사용자 의도, 작업 계획을 포함하여 총 2000개의 작업을 담고 있다. 우리는 AmbiK가 연구자들이 모호성 탐지 방법을 통일된 방식으로 비교할 수 있도록 해주기를 바란다. AmbiK는 https://github.com/cog-model/AmbiK-dataset에서 이용 가능하다.

English

As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different datasets and there is no universal benchmark. For this reason, we propose AmbiK (Ambiguous Tasks in Kitchen Environment), the fully textual dataset of ambiguous instructions addressed to a robot in a kitchen environment. AmbiK was collected with the assistance of LLMs and is human-validated. It comprises 1000 pairs of ambiguous tasks and their unambiguous counterparts, categorized by ambiguity type (Human Preferences, Common Sense Knowledge, Safety), with environment descriptions, clarifying questions and answers, user intents, and task plans, for a total of 2000 tasks. We hope that AmbiK will enable researchers to perform a unified comparison of ambiguity detection methods. AmbiK is available at https://github.com/cog-model/AmbiK-dataset.

AmbiK: 주방 환경에서의 모호한 작업 데이터셋

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

초록

Support