ColorAgent: 堅牢でパーソナライズされたインタラクティブOSエージェントの構築

要旨

ハードウェア、ソフトウェア、そして大規模言語モデル技術の進化に伴い、人間とオペレーティングシステム（OS）間のインタラクションは、コマンドラインインターフェースから急速に台頭するAIエージェントインタラクションへと進化してきました。ユーザーの指示を実行し、ユーザーの意図を忠実に反映するOSエージェントの構築が現実のものとなりつつあります。本技術レポートでは、長期的で堅牢な環境インタラクションを実現しつつ、パーソナライズされた積極的なユーザーインタラクションを可能にするOSエージェント「ColorAgent」を紹介します。環境との長期的なインタラクションを実現するため、段階的な強化学習と自己進化型トレーニングを通じてモデルの能力を強化し、汎用性、一貫性、堅牢性を確保するための専用のマルチエージェントフレームワークを開発しました。ユーザーインタラクションに関しては、パーソナライズされたユーザー意図認識と積極的なエンゲージメントを探求し、OSエージェントを単なる自動化ツールではなく、温かみのある協力的なパートナーとして位置づけています。ColorAgentをAndroidWorldおよびAndroidLabベンチマークで評価し、それぞれ77.2％と50.7％の成功率を達成し、新たな最先端を確立しました。ただし、現在のベンチマークはOSエージェントの包括的な評価には不十分であることを指摘し、今後の研究において特に評価パラダイム、エージェント間の協調、セキュリティの分野でのさらなる探求を提案します。私たちのコードはhttps://github.com/MadeAgents/mobile-useで公開されています。

English

With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.

ColorAgent: 堅牢でパーソナライズされたインタラクティブOSエージェントの構築

ColorAgent: Building A Robust, Personalized, and Interactive OS Agent

要旨

Support