UI2Code^N:一種適用於測試階段可擴展互動式介面轉程式碼生成的視覺語言模型
UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
November 11, 2025
作者: Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang
cs.AI
摘要
使用者介面(UI)程式設計是現代軟體開發的核心環節,卻也具備高度複雜性。近期視覺語言模型(VLM)的進展展現了自動化 UI 程式碼生成的潛力,但現有方法面臨兩大侷限:多模態程式碼生成能力尚未成熟,且單輪對話模式難以有效利用迭代式視覺回饋。為解決這些挑戰,我們提出互動式 UI 轉程式碼範式,該範式更貼近真實工作流程,並提升了可實現效能的理論上限。在此範式下,我們推出 UI2Code^N——透過分階段預訓練、微調與強化學習訓練而成的視覺語言模型,實現多模態程式碼生成能力的基礎性突破。該模型整合三大核心能力:UI 轉程式碼生成、UI 編輯與 UI 精修。我們進一步探索互動式生成的測試時擴展機制,實現多輪視覺回饋的系統性運用。在 UI 轉程式碼與 UI 精修基準測試中,UI2Code^N 在開源模型中創下最新效能標竿,並達到與 Claude-4-Sonnet、GPT-5 等領先閉源模型相當的表現。程式碼與模型已開源於:https://github.com/zai-org/UI2Code_N。
English
User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.