全前端工程流程中的多模态大语言模型基准测试

摘要

前端工程涉及一套复杂的工作流程，工程师们在此过程中构思设计、将其转化为代码，并迭代优化实现。尽管近期的基准测试主要聚焦于将视觉设计转换为代码，我们提出了FullFront，这是一个旨在评估多模态大语言模型（MLLMs）在整个前端开发流程中表现的基准。FullFront评估了直接映射到前端工程流程的三个基本任务：网页设计（构思阶段）、网页感知问答（理解视觉组织与元素）以及网页代码生成（实现阶段）。与现有基准测试不同，后者要么使用代码冗余的抓取网站，要么采用过于简化的LLM生成的HTML，FullFront采用了一种新颖的两阶段过程，将现实世界的网页转化为干净、标准化的HTML，同时保持多样化的视觉设计并避免版权问题。对顶尖MLLMs的广泛测试揭示了在页面感知、代码生成（尤其是图像处理和布局方面）以及交互实现上的显著局限。我们的结果定量展示了不同模型和任务间的性能差异，并凸显了当前MLLM能力与人类专家在前端工程领域表现之间的巨大差距。FullFront基准测试及代码可在https://github.com/Mikivishy/FullFront获取。

English

Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we present FullFront, a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) across the full front-end development pipeline. FullFront assesses three fundamental tasks that map directly to the front-end engineering pipeline: Webpage Design (conceptualization phase), Webpage Perception QA (comprehension of visual organization and elements), and Webpage Code Generation (implementation phase). Unlike existing benchmarks that use either scraped websites with bloated code or oversimplified LLM-generated HTML, FullFront employs a novel, two-stage process to transform real-world webpages into clean, standardized HTML while maintaining diverse visual designs and avoiding copyright issues. Extensive testing of state-of-the-art MLLMs reveals significant limitations in page perception, code generation (particularly for image handling and layout), and interaction implementation. Our results quantitatively demonstrate performance disparities across models and tasks, and highlight a substantial gap between current MLLM capabilities and human expert performance in front-end engineering. The FullFront benchmark and code are available in https://github.com/Mikivishy/FullFront.

全前端工程流程中的多模态大语言模型基准测试

FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

摘要

Support