色彩代理：構建一個堅固、個性化且互動的操作系統代理

摘要

随着硬件、软件及大型语言模型技术的进步，人类与操作系统之间的交互已从命令行界面演进至迅速兴起的智能代理互动。构建一个能够执行用户指令并忠实遵循用户意愿的操作系统（OS）代理正逐渐成为现实。在本技术报告中，我们介绍了ColorAgent，这是一款旨在与环境进行长期、稳健交互，同时实现个性化与主动性用户互动的OS代理。为了支持与环境的长期交互，我们通过分步强化学习与自我进化训练增强了模型的能力，并开发了一套定制的多代理框架，确保其通用性、一致性与鲁棒性。在用户交互方面，我们探索了个性化用户意图识别与主动参与，将OS代理定位为不仅是一个自动化工具，更是一个温暖、协作的伙伴。我们在AndroidWorld与AndroidLab基准测试上对ColorAgent进行了评估，分别取得了77.2%与50.7%的成功率，确立了新的技术前沿。然而，我们指出当前基准测试尚不足以全面评估OS代理，并建议未来工作中进一步探索评估范式、代理协作及安全等领域的方向。我们的代码公开于https://github.com/MadeAgents/mobile-use。

English

With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.