Rapporto Tecnico UI-Venus-1.5
UI-Venus-1.5 Technical Report
February 9, 2026
Autori: Veuns-Team, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, Xingran Zhou, Weizhi Chen, Sunhao Dai, Jingya Dou, Yichen Gong, Yuan Guo, Zhenlin Guo, Feng Li, Qian Li, Jinzhen Lin, Yuqi Zhou, Linchao Zhu, Liang Chen, Zhenyu Guo, Changhua Meng, Weiqiang Wang
cs.AI
Abstract
Gli agenti GUI sono emersi come un potente paradigma per automatizzare le interazioni negli ambienti digitali, ma raggiungere un'ampia generalità e prestazioni costantemente elevate rimane una sfida. In questo rapporto presentiamo UI-Venus-1.5, un agente GUI unificato end-to-end progettato per applicazioni robuste nel mondo reale. La famiglia di modelli proposta comprende due varianti dense (2B e 8B) e una variante mixture-of-experts (30B-A3B) per soddisfare diversi scenari applicativi downstream. Rispetto alla nostra versione precedente, UI-Venus-1.5 introduce tre progressi tecnici chiave: (1) una fase completa di Mid-Training che utilizza 10 miliardi di token su oltre 30 dataset per stabilire una semantica GUI fondamentale; (2) Apprendimento per Rinforzo Online con rollout a traiettoria completa, allineando gli obiettivi di addestramento alla navigazione dinamica a lungo termine in ambienti su larga scala; e (3) un singolo agente GUI unificato costruito tramite Model Merging, che sintetizza modelli dominio-specifici (grounding, web e mobile) in un checkpoint coerente. Valutazioni estensive dimostrano che UI-Venus-1.5 stabilisce nuove prestazioni state-of-the-art su benchmark come ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%) e AndroidWorld (77.6%), superando significativamente i precedenti baseline forti. Inoltre, UI-Venus-1.5 dimostra robuste capacità di navigazione su varie app mobili cinesi, eseguendo efficacemente le istruzioni utente in scenari reali. Codice: https://github.com/inclusionAI/UI-Venus; Modello: https://huggingface.co/collections/inclusionAI/ui-venus
English
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus