GUI Odyssey:一個針對行動裝置跨應用程式GUI導覽的全面資料集
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
June 12, 2024
作者: Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo
cs.AI
摘要
智能手機用戶經常在多個應用程式之間導航,以完成諸如在社交媒體平台之間分享內容等任務。自主圖形使用者介面(GUI)導航代理可以通過簡化工作流程和減少手動干預來增強用戶在通訊、娛樂和生產力方面的體驗。然而,先前的GUI代理通常是使用包含可以在單個應用程式內完成的簡單任務的數據集進行訓練,這導致在跨應用程式導航方面表現不佳。為解決這個問題,我們引入了GUI Odyssey,這是一個用於訓練和評估跨應用程式導航代理的全面數據集。GUI Odyssey 包括來自6部移動設備的7,735個情節,涵蓋6種跨應用程式任務、201個應用程式和1.4K個應用程式組合。通過利用GUI Odyssey,我們開發了OdysseyAgent,一個多模式跨應用程式導航代理,通過對Qwen-VL模型進行微調並加入歷史重採樣模塊。大量實驗證明OdysseyAgent相對於現有模型具有更高的準確性。例如,OdysseyAgent在域內準確性方面超越了微調的Qwen-VL和零樣本GPT-4V分別達到1.44\%和55.49\%,在域外準確性方面平均分別達到2.29\%和48.14\%。數據集和代碼將在https://github.com/OpenGVLab/GUI-Odyssey 上發布。
English
Smartphone users often navigate across multiple applications (apps) to
complete tasks such as sharing content between social media platforms.
Autonomous Graphical User Interface (GUI) navigation agents can enhance user
experience in communication, entertainment, and productivity by streamlining
workflows and reducing manual intervention. However, prior GUI agents often
trained with datasets comprising simple tasks that can be completed within a
single app, leading to poor performance in cross-app navigation. To address
this problem, we introduce GUI Odyssey, a comprehensive dataset for training
and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735
episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps,
and 1.4K app combos. Leveraging GUI Odyssey, we developed OdysseyAgent, a
multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a
history resampling module. Extensive experiments demonstrate OdysseyAgent's
superior accuracy compared to existing models. For instance, OdysseyAgent
surpasses fine-tuned Qwen-VL and zero-shot GPT-4V by 1.44\% and 55.49\%
in-domain accuracy, and 2.29\% and 48.14\% out-of-domain accuracy on average.
The dataset and code will be released in
https://github.com/OpenGVLab/GUI-Odyssey.Summary
AI-Generated Summary