WebVIA:基于Web的视觉语言智能体框架,实现交互式可验证的界面到代码生成
WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
November 9, 2025
作者: Mingde Xu, Zhen Yang, Wenyi Hong, Lihang Pan, Xinyue Fan, Yan Wang, Xiaotao Gu, Bin Xu, Jie Tang
cs.AI
摘要
用户界面(UI)开发需要将设计稿转化为功能代码,这一过程仍存在重复性高、劳动密集的特点。虽然当前视觉语言模型(VLM)能自动实现UI到代码的生成,但其仅能生成静态的HTML/CSS/JavaScript布局,缺乏交互性。为此,我们提出WebVIA——首个支持交互式UI到代码生成与验证的智能体框架。该框架包含三大组件:1)用于捕捉多状态UI截图的探索智能体;2)生成可执行交互代码的UI2Code模型;3)验证交互功能的检测模块。实验表明,WebVIA智能体相比通用智能体(如Gemini-2.5-Pro)能实现更稳定精准的UI探索。此外,我们微调后的WebVIA-UI2Code模型在生成可执行交互式HTML/CSS/JavaScript代码方面显著优于基线模型,在交互式和静态UI2Code基准测试中均表现优异。代码与模型已开源:https://webvia.github.io。
English
User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at https://zheny2751-dotcom.github.io/webvia.github.io/{https://webvia.github.io}.