UI2Code^N：一种支持测试时扩展的交互式UI到代码生成视觉语言模型

摘要

用户界面（UI）编程是现代软件开发的核心环节，却具有高度复杂性。视觉语言模型（VLMs）的最新进展凸显了自动UI编程的潜力，但现有方法存在两大局限：多模态编程能力尚未成熟，单轮交互范式难以有效利用迭代式视觉反馈。针对这些挑战，我们提出了一种交互式UI转代码范式，该范式更贴近实际工作流程并提升了性能上限。基于此范式，我们推出了UI2Code^N模型——通过分阶段预训练、微调与强化学习训练的视觉语言模型，实现了多模态编程能力的根本性提升。该模型融合三大核心能力：UI转代码生成、UI编辑及UI优化。我们进一步探索了交互式生成中的测试时扩展技术，实现多轮反馈的系统化利用。在UI转代码和UI优化基准测试中，UI2Code^N在开源模型中达到最新最优水平，其性能媲美Claude-4-Sonnet、GPT-5等领先闭源模型。代码与模型已开源：https://github.com/zai-org/UI2Code_N。

English

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

UI2Code^N：一种支持测试时扩展的交互式UI到代码生成视觉语言模型

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

摘要

Support