UI2Code^N:一种支持测试时扩展的交互式UI到代码生成视觉语言模型
UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
November 11, 2025
作者: Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang
cs.AI
摘要
用户界面(UI)编程是现代软件开发的核心环节,却具有高度复杂性。视觉语言模型(VLMs)的最新进展凸显了自动UI编程的潜力,但现有方法存在两大局限:多模态编程能力尚未成熟,单轮交互范式难以有效利用迭代式视觉反馈。针对这些挑战,我们提出了一种交互式UI转代码范式,该范式更贴近实际工作流程并提升了性能上限。基于此范式,我们推出了UI2Code^N模型——通过分阶段预训练、微调与强化学习训练的视觉语言模型,实现了多模态编程能力的根本性提升。该模型融合三大核心能力:UI转代码生成、UI编辑及UI优化。我们进一步探索了交互式生成中的测试时扩展技术,实现多轮反馈的系统化利用。在UI转代码和UI优化基准测试中,UI2Code^N在开源模型中达到最新最优水平,其性能媲美Claude-4-Sonnet、GPT-5等领先闭源模型。代码与模型已开源:https://github.com/zai-org/UI2Code_N。
English
User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.