ChatPaper.aiChatPaper

Code2World:基于可渲染代码生成的图形用户界面世界模型

Code2World: A GUI World Model via Renderable Code Generation

February 10, 2026
作者: Yuhao Zheng, Li'an Zhong, Yi Wang, Rui Dai, Kaikui Liu, Xiangxiang Chu, Linyuan Lv, Philip Torr, Kevin Qinghong Lin
cs.AI

摘要

自主GUI代理通过感知界面并执行操作与环境交互。作为虚拟沙盒,GUI世界模型通过支持条件动作预测使代理具备类人预见能力。然而现有基于文本和像素的方法难以同时实现高视觉保真度与细粒度结构可控性。为此,我们提出Code2World——一种通过可渲染代码生成模拟下一视觉状态的视觉语言编码器。具体而言,为解决数据稀缺问题,我们构建了AndroidCode数据集:将GUI轨迹转换为高保真HTML代码,并通过视觉反馈修正机制优化合成代码,最终获得包含8万+高质量屏幕-动作对的数据集。为适配现有VLM进行代码预测,我们首先执行SFT作为格式布局跟随的冷启动,进而应用渲染感知强化学习——以渲染结果作为奖励信号,强化视觉语义保真度与动作一致性。大量实验表明,Code2World-8B在下一UI预测任务中表现最佳,媲美竞品GPT-5和Gemini-3-Pro-Image。值得注意的是,Code2World以灵活方式显著提升下游导航成功率,在AndroidWorld导航任务中将Gemini-2.5-Flash的性能提升9.5%。代码已开源:https://github.com/AMAP-ML/Code2World。
English
Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However, existing text- and pixel-based approaches struggle to simultaneously achieve high visual fidelity and fine-grained structural controllability. To this end, we propose Code2World, a vision-language coder that simulates the next visual state via renderable code generation. Specifically, to address the data scarcity problem, we construct AndroidCode by translating GUI trajectories into high-fidelity HTML and refining synthesized code through a visual-feedback revision mechanism, yielding a corpus of over 80K high-quality screen-action pairs. To adapt existing VLMs into code prediction, we first perform SFT as a cold start for format layout following, then further apply Render-Aware Reinforcement Learning which uses rendered outcome as the reward signal by enforcing visual semantic fidelity and action consistency. Extensive experiments demonstrate that Code2World-8B achieves the top-performing next UI prediction, rivaling the competitive GPT-5 and Gemini-3-Pro-Image. Notably, Code2World significantly enhances downstream navigation success rates in a flexible manner, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation. The code is available at https://github.com/AMAP-ML/Code2World.
PDF1682February 12, 2026