FullFront: 전체 프론트엔드 엔지니어링 워크플로우에서의 MLLM 벤치마킹

초록

프론트엔드 엔지니어링은 엔지니어들이 디자인을 개념화하고, 이를 코드로 변환하며, 구현을 반복적으로 개선하는 복잡한 워크플로우를 포함합니다. 최근 벤치마크들은 주로 시각적 디자인을 코드로 변환하는 데 초점을 맞추고 있지만, 우리는 전체 프론트엔드 개발 파이프라인을 평가하기 위해 설계된 벤치마크인 FullFront를 소개합니다. FullFront는 프론트엔드 엔지니어링 파이프라인에 직접 매핑되는 세 가지 기본 작업을 평가합니다: 웹페이지 디자인(개념화 단계), 웹페이지 인지 QA(시각적 조직 및 요소 이해), 웹페이지 코드 생성(구현 단계). 기존 벤치마크들이 불필요한 코드가 포함된 스크랩된 웹사이트나 지나치게 단순화된 LLM 생성 HTML을 사용하는 것과 달리, FullFront는 실제 웹페이지를 깔끔하고 표준화된 HTML로 변환하면서도 다양한 시각적 디자인을 유지하고 저작권 문제를 피하기 위한 새로운 두 단계 프로세스를 사용합니다. 최첨단 MLLM에 대한 광범위한 테스트는 페이지 인지, 코드 생성(특히 이미지 처리 및 레이아웃), 상호작용 구현에서의 상당한 한계를 드러냅니다. 우리의 결과는 모델과 작업 간의 성능 차이를 정량적으로 보여주며, 현재 MLLM의 능력과 프론트엔드 엔지니어링에서의 인간 전문가 성능 간의 상당한 격차를 강조합니다. FullFront 벤치마크와 코드는 https://github.com/Mikivishy/FullFront에서 확인할 수 있습니다.

English

Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we present FullFront, a benchmark designed to evaluate Multimodal Large Language Models (MLLMs) across the full front-end development pipeline. FullFront assesses three fundamental tasks that map directly to the front-end engineering pipeline: Webpage Design (conceptualization phase), Webpage Perception QA (comprehension of visual organization and elements), and Webpage Code Generation (implementation phase). Unlike existing benchmarks that use either scraped websites with bloated code or oversimplified LLM-generated HTML, FullFront employs a novel, two-stage process to transform real-world webpages into clean, standardized HTML while maintaining diverse visual designs and avoiding copyright issues. Extensive testing of state-of-the-art MLLMs reveals significant limitations in page perception, code generation (particularly for image handling and layout), and interaction implementation. Our results quantitatively demonstrate performance disparities across models and tasks, and highlight a substantial gap between current MLLM capabilities and human expert performance in front-end engineering. The FullFront benchmark and code are available in https://github.com/Mikivishy/FullFront.