智能体能否征服网络?探索ChatGPT Atlas智能体在网络游戏中的前沿应用
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
October 30, 2025
作者: Jingran Zhang, Ning Li, Justin Cui
cs.AI
摘要
OpenAI推出的ChatGPT Atlas模型新增了网页交互能力,使模型能够分析网页内容、处理用户意图,并在浏览器内直接执行光标与键盘输入操作。虽然其信息检索功能已得到验证,但该模型在动态交互环境中的表现仍有待探索。本研究以浏览器游戏为测试场景(包括谷歌恐龙跑酷、数独、Flappy Bird和Stein.world),对Atlas的网页交互能力进行早期评估。我们采用游戏内得分作为量化指标,衡量其在不同任务类型中的表现。结果显示:Atlas在数独等逻辑推理任务中表现优异,解题速度显著超越人类基准;但在需要精确时序和动作控制的实时游戏中表现欠佳,往往难以突破初始障碍。这表明尽管Atlas具备较强的分析处理能力,但在需要实时交互的动态网络环境中仍存在明显局限。本项目网站地址:https://atlas-game-eval.github.io。
English
OpenAI's ChatGPT Atlas introduces new capabilities for web interaction,
enabling the model to analyze webpages, process user intents, and execute
cursor and keyboard inputs directly within the browser. While its capacity for
information retrieval tasks has been demonstrated, its performance in dynamic,
interactive environments remains less explored. In this study, we conduct an
early evaluation of Atlas's web interaction capabilities using browser-based
games as test scenarios, including Google's T-Rex Runner, Sudoku, Flappy Bird,
and Stein.world. We employ in-game performance scores as quantitative metrics
to assess performance across different task types. Our results show that Atlas
performs strongly in logical reasoning tasks like Sudoku, completing puzzles
significantly faster than human baselines, but struggles substantially in
real-time games requiring precise timing and motor control, often failing to
progress beyond initial obstacles. These findings suggest that while Atlas
demonstrates capable analytical processing, there remain notable limitations in
dynamic web environments requiring real-time interaction. The website of our
project can be found at https://atlas-game-eval.github.io.