ColorAgent:构建一个稳健、个性化且互动的操作系统代理
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
October 22, 2025
作者: Ning Li, Qiqiang Lin, Zheng Wu, Xiaoyun Mo, Weiming Zhang, Yin Zhao, Xiangmou Qu, Jiamu Zhou, Jun Wang, Congmin Zheng, Yuanyi Song, Hongjiang Chen, Heyuan Huang, Jihong Wang, Jiaxin Yin, Jingwei Yu, Junwei Liao, Qiuying Peng, Xingyu Lou, Jun Wang, Weiwen Liu, Zhuosheng Zhang, Weinan Zhang
cs.AI
摘要
随着硬件、软件及大规模语言模型技术的进步,人类与操作系统之间的交互已从命令行界面演进至快速兴起的AI代理交互。构建一个能够执行用户指令并忠实遵循用户意愿的操作系统(OS)代理正逐渐成为现实。在本技术报告中,我们介绍了ColorAgent,一个旨在与环境进行长期、稳健交互,同时实现个性化与主动用户交互的OS代理。为了支持与环境的长期交互,我们通过分步强化学习和自我进化训练增强了模型能力,并开发了一个定制的多代理框架,确保其通用性、一致性和鲁棒性。在用户交互方面,我们探索了个性化用户意图识别与主动互动,将OS代理定位为不仅是自动化工具,更是一个温暖、协作的伙伴。我们在AndroidWorld和AndroidLab基准测试上对ColorAgent进行了评估,分别取得了77.2%和50.7%的成功率,创下了新的技术标杆。然而,我们指出当前基准测试尚不足以全面评估OS代理,并建议未来工作中进一步探索评估范式、代理协作及安全等领域。我们的代码已发布于https://github.com/MadeAgents/mobile-use。
English
With the advancements in hardware, software, and large language model
technologies, the interaction between humans and operating systems has evolved
from the command-line interface to the rapidly emerging AI agent interactions.
Building an operating system (OS) agent capable of executing user instructions
and faithfully following user desires is becoming a reality. In this technical
report, we present ColorAgent, an OS agent designed to engage in long-horizon,
robust interactions with the environment while also enabling personalized and
proactive user interaction. To enable long-horizon interactions with the
environment, we enhance the model's capabilities through step-wise
reinforcement learning and self-evolving training, while also developing a
tailored multi-agent framework that ensures generality, consistency, and
robustness. In terms of user interaction, we explore personalized user intent
recognition and proactive engagement, positioning the OS agent not merely as an
automation tool but as a warm, collaborative partner. We evaluate ColorAgent on
the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2%
and 50.7%, respectively, establishing a new state of the art. Nonetheless, we
note that current benchmarks are insufficient for a comprehensive evaluation of
OS agents and propose further exploring directions in future work, particularly
in the areas of evaluation paradigms, agent collaboration, and security. Our
code is available at https://github.com/MadeAgents/mobile-use.