ChatPaper.aiChatPaper

WebVIA:基於網頁的視覺語言代理框架,實現可互動且可驗證的使用者介面轉程式碼生成

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

November 9, 2025
作者: Mingde Xu, Zhen Yang, Wenyi Hong, Lihang Pan, Xinyue Fan, Yan Wang, Xiaotao Gu, Bin Xu, Jie Tang
cs.AI

摘要

使用者介面(UI)開發需要將設計稿轉換為功能性程式碼,這個過程至今仍重複性高且耗費人力。儘管近期視覺語言模型(VLM)能自動化實現 UI-to-Code 生成,但其僅能產生缺乏互動性的靜態 HTML/CSS/JavaScript 佈局。為解決此問題,我們提出首個具備代理能力的互動式 UI-to-Code 生成與驗證框架 WebVIA。該框架包含三大組件:1)用於捕捉多狀態 UI 截圖的探索代理;2)生成可執行互動程式碼的 UI2Code 模型;3)驗證互動功能的檢測模組。實驗結果表明,WebVIA-Agent 相較通用代理(如 Gemini-2.5-Pro)能實現更穩定精準的 UI 探索。此外,我們微調後的 WebVIA-UI2Code 模型在生成可執行互動的 HTML/CSS/JavaScript 程式碼方面顯著提升,於互動式與靜態 UI2Code 基準測試中均超越其基礎模型。相關程式碼與模型已開源於 https://zheny2751-dotcom.github.io/webvia.github.io/{https://webvia.github.io}。
English
User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at https://zheny2751-dotcom.github.io/webvia.github.io/{https://webvia.github.io}.
PDF132December 1, 2025