GUI Odyssey:用于移动设备上跨应用GUI导航的全面数据集
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
June 12, 2024
作者: Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo
cs.AI
摘要
智能手机用户经常在多个应用程序之间导航,以完成诸如在社交媒体平台之间共享内容之类的任务。自主图形用户界面(GUI)导航代理可以通过简化工作流程和减少手动干预来增强用户在通信、娱乐和生产力方面的体验。然而,先前的GUI代理通常是使用包含可在单个应用程序内完成的简单任务的数据集进行训练的,导致在跨应用程序导航方面表现不佳。为解决这一问题,我们引入了GUI Odyssey,这是一个用于训练和评估跨应用程序导航代理的全面数据集。GUI Odyssey包括来自6部移动设备的7,735个情节,涵盖6种跨应用程序任务、201个应用程序和1.4K个应用程序组合。利用GUI Odyssey,我们通过使用历史重采样模块对Qwen-VL模型进行微调,开发了OdysseyAgent,一个多模式跨应用程序导航代理。大量实验表明,与现有模型相比,OdysseyAgent具有更高的准确性。例如,OdysseyAgent在领域内准确性方面超过了微调的Qwen-VL和零样本GPT-4V分别为1.44\%和55.49\%,在领域外准确性方面分别为2.29\%和48.14%。数据集和代码将在https://github.com/OpenGVLab/GUI-Odyssey发布。
English
Smartphone users often navigate across multiple applications (apps) to
complete tasks such as sharing content between social media platforms.
Autonomous Graphical User Interface (GUI) navigation agents can enhance user
experience in communication, entertainment, and productivity by streamlining
workflows and reducing manual intervention. However, prior GUI agents often
trained with datasets comprising simple tasks that can be completed within a
single app, leading to poor performance in cross-app navigation. To address
this problem, we introduce GUI Odyssey, a comprehensive dataset for training
and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735
episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps,
and 1.4K app combos. Leveraging GUI Odyssey, we developed OdysseyAgent, a
multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a
history resampling module. Extensive experiments demonstrate OdysseyAgent's
superior accuracy compared to existing models. For instance, OdysseyAgent
surpasses fine-tuned Qwen-VL and zero-shot GPT-4V by 1.44\% and 55.49\%
in-domain accuracy, and 2.29\% and 48.14\% out-of-domain accuracy on average.
The dataset and code will be released in
https://github.com/OpenGVLab/GUI-Odyssey.Summary
AI-Generated Summary