AMEX:Android 多標註博覽會數據集,用於移動 GUI 代理程序
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
July 3, 2024
作者: Yuxiang Chai, Siyuan Huang, Yazhe Niu, Han Xiao, Liang Liu, Dingyu Zhang, Peng Gao, Shuai Ren, Hongsheng Li
cs.AI
摘要
AI代理人主要因其感知環境、理解任務和自主實現目標的能力而受到越來越多的關注。為了推進移動場景中AI代理人的研究,我們介紹了Android Multi-annotation EXpo(AMEX),這是一個針對通用移動GUI控制代理人設計的全面的大規模數據集。他們通過直接與移動設備上的圖形用戶界面(GUI)進行交互來完成複雜任務的能力是使用所提出的數據集進行訓練和評估的。AMEX包含來自110個熱門移動應用程序的超過104K高分辨率截圖,這些截圖在多個級別上進行了標註。與現有的移動設備控制數據集(例如MoTIF、AitW等)不同,AMEX包括三個級別的標註:GUI互動元素定位、GUI屏幕和元素功能描述,以及複雜的自然語言指令,每個平均包含13個步驟,具有逐步的GUI操作鏈。我們從性更具啟發性和詳細的角度開發了這個數據集,以補充現有數據集的一般設置。此外,我們開發了一個基準模型SPHINX Agent,並比較其在其他數據集上訓練的最先進代理人的性能。為了促進進一步的研究,我們開源了我們的數據集、模型和相關評估工具。該項目可在https://yuxiangchai.github.io/AMEX/ 上找到。
English
AI agents have drawn increasing attention mostly on their ability to perceive
environments, understand tasks, and autonomously achieve goals. To advance
research on AI agents in mobile scenarios, we introduce the Android
Multi-annotation EXpo (AMEX), a comprehensive, large-scale dataset designed for
generalist mobile GUI-control agents. Their capabilities of completing complex
tasks by directly interacting with the graphical user interface (GUI) on mobile
devices are trained and evaluated with the proposed dataset. AMEX comprises
over 104K high-resolution screenshots from 110 popular mobile applications,
which are annotated at multiple levels. Unlike existing mobile device-control
datasets, e.g., MoTIF, AitW, etc., AMEX includes three levels of annotations:
GUI interactive element grounding, GUI screen and element functionality
descriptions, and complex natural language instructions, each averaging 13
steps with stepwise GUI-action chains. We develop this dataset from a more
instructive and detailed perspective, complementing the general settings of
existing datasets. Additionally, we develop a baseline model SPHINX Agent and
compare its performance across state-of-the-art agents trained on other
datasets. To facilitate further research, we open-source our dataset, models,
and relevant evaluation tools. The project is available at
https://yuxiangchai.github.io/AMEX/Summary
AI-Generated Summary