智慧體能否征服網路?探索ChatGPT Atlas智慧體在網頁遊戲中的前沿應用
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
October 30, 2025
作者: Jingran Zhang, Ning Li, Justin Cui
cs.AI
摘要
OpenAI的ChatGPT Atlas新增了網頁互動功能,使模型能夠分析網頁內容、處理使用者意圖,並直接在瀏覽器中執行游標與鍵盤輸入操作。雖然其資訊檢索能力已獲驗證,但在動態互動環境中的表現仍鮮少被探討。本研究以瀏覽器遊戲(包括Google的暴龍跑者、數獨、Flappy Bird和Stein.world)作為測試情境,對Atlas的網頁互動能力進行早期評估。我們採用遊戲內績效分數作為量化指標,衡量模型在不同任務類型中的表現。結果顯示Atlas在數獨等邏輯推理任務中表現優異,解題速度顯著超越人類基準,但在需要精確時機掌握與動作控制的即時遊戲中表現欠佳,往往無法通過初始障礙。這些發現表明,儘管Atlas具備優秀的分析處理能力,但在需要即時互動的動態網頁環境中仍存在明顯局限。本專案網站請見:https://atlas-game-eval.github.io。
English
OpenAI's ChatGPT Atlas introduces new capabilities for web interaction,
enabling the model to analyze webpages, process user intents, and execute
cursor and keyboard inputs directly within the browser. While its capacity for
information retrieval tasks has been demonstrated, its performance in dynamic,
interactive environments remains less explored. In this study, we conduct an
early evaluation of Atlas's web interaction capabilities using browser-based
games as test scenarios, including Google's T-Rex Runner, Sudoku, Flappy Bird,
and Stein.world. We employ in-game performance scores as quantitative metrics
to assess performance across different task types. Our results show that Atlas
performs strongly in logical reasoning tasks like Sudoku, completing puzzles
significantly faster than human baselines, but struggles substantially in
real-time games requiring precise timing and motor control, often failing to
progress beyond initial obstacles. These findings suggest that while Atlas
demonstrates capable analytical processing, there remain notable limitations in
dynamic web environments requiring real-time interaction. The website of our
project can be found at https://atlas-game-eval.github.io.