ChatPaper.aiChatPaper

MM-WebAgent:面向网页生成的层次化多模态网络代理

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

April 16, 2026
作者: Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo
cs.AI

摘要

人工智能生成内容(AIGC)工具的快速发展使得图像、视频及可视化素材能够按需生成用于网页设计,为现代UI/UX提供了一种灵活且日益普及的创作范式。然而,直接将此类工具集成到自动化网页生成中常因元素孤立生成而导致风格不一致和全局协调性差的问题。我们提出MM-WebAgent——一种用于多模态网页生成的分层智能体框架,通过分层规划与迭代自反思协调基于AIGC的元素生成。该框架联合优化全局布局、局部多模态内容及其整合,生成具有连贯性与视觉一致性的网页。我们进一步构建了多模态网页生成基准测试集及多层次评估方案以进行系统化评估。实验表明,MM-WebAgent在代码生成和基于智能体的基线方法中表现优异,尤其在多模态元素生成与整合方面优势显著。代码与数据详见:https://aka.ms/mm-webagent。
English
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.
PDF20April 18, 2026