UI2Code^N: テスト時スケーラブルな対話型UI-to-Code生成のための視覚言語モデル

要旨

ユーザーインターフェース（UI）プログラミングは、現代のソフトウェア開発において中核的でありながら非常に複雑な領域です。視覚言語モデル（VLM）の最近の進歩は、自動UIコーディングの可能性を示していますが、現在のアプローチには2つの主要な課題があります：マルチモーダルなコーディング能力が未発達であること、および単一ターンのパラダイムでは反復的な視覚的フィードバックがほとんど活用されないことです。私たちはこれらの課題に対し、実世界のワークフローをより反映し、達成可能な性能の上限を引き上げるインタラクティブなUI-to-codeパラダイムを提案します。このパラダイムの下で、段階的な事前学習、ファインチューニング、強化学習を通じて訓練された視覚言語モデルUI2Code^Nを発表します。本モデルは、UI-to-code生成、UI編集、UI洗練という3つの主要機能を統合し、マルチモーダルコーディングにおける基盤的な改善を実現します。さらに、インタラクティブ生成のためのテスト時スケーリングを探求し、マルチターンフィードバックの体系的な利用を可能にします。UI-to-codeおよびUI洗練ベンチマークにおける実験では、UI2Code^Nがオープンソースモデルの中で新たなstate-of-the-artを確立し、Claude-4-SonnetやGPT-5などの主要なクローズドソースモデルに匹敵する性能を達成することを示しています。コードとモデルはhttps://github.com/zai-org/UI2Code_N で公開されています。

English

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

UI2Code^N: テスト時スケーラブルな対話型UI-to-Code生成のための視覚言語モデル

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

要旨

Support