UI2Code^N: 테스트 타임 확장 가능한 대화형 UI-코드 생성을 위한 시각 언어 모델

초록

사용자 인터페이스(UI) 프로그래밍은 현대 소프트웨어 개발의 핵심이면서도 매우 복잡한 부분입니다. 시각 언어 모델(VLM)의 최근 발전은 자동 UI 코딩의 잠재력을 부각시키지만, 현재의 접근법은 두 가지 주요 한계에 직면해 있습니다: 멀티모달 코딩 능력이 여전히 미흡하고, 단일 턴 패러다임은 반복적인 시각적 피드백을 거의 활용하지 못합니다. 본 연구은 이러한 과제를 실제 작업 흐름을 더 잘 반영하고 달성 가능한 성능의 상한선을 높이는 대화형 UI-to-code 패러다임으로 해결합니다. 이 패러다임 아래에서, 우리는 단계적 사전 학습, 미세 조정, 강화 학습을 통해 훈련되어 멀티모달 코딩에서 근본적인 개선을 이루는 시각 언어 모델 UI2Code^N을 제시합니다. 이 모델은 UI-to-code 생성, UI 편집, UI 다듬기라는 세 가지 핵심 능력을 통합합니다. 우리는 또한 대화형 생성을 위한 테스트 타임 스케일링을 탐구하여 다중 턴 피드백의 체계적인 사용을 가능하게 합니다. UI-to-code 및 UI 다듬기 벤치마크에 대한 실험 결과, UI2Code^N은 오픈소스 모델 중 새로운 최첨단 성능을确立하고 Claude-4-Sonnet 및 GPT-5와 같은 주요 클로즈드소스 모델에 버금가는 성능을 달성함을 보여줍니다. 우리의 코드와 모델은 https://github.com/zai-org/UI2Code_N에서 이용 가능합니다.

English

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

UI2Code^N: 테스트 타임 확장 가능한 대화형 UI-코드 생성을 위한 시각 언어 모델

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

초록

Support