LaTCoder: 레이아웃-생각 방식으로 웹페이지 디자인을 코드로 변환

초록

웹페이지 디자인을 코드로 변환하는 작업(디자인-투-코드)은 프론트엔드 개발자들에게 시각적 디자인과 기능적 구현 간의 간극을 메우는 데 있어 사용자 인터페이스(UI) 개발에서 중요한 역할을 합니다. 최근 멀티모달 대형 언어 모델(MLLM)들이 디자인-투-코드 작업에서 상당한 잠재력을 보여주고 있지만, 코드 생성 과정에서 레이아웃을 정확하게 유지하지 못하는 경우가 많습니다. 이를 해결하기 위해 우리는 인간 인지 과정에서의 사고의 연쇄(Chain-of-Thought, CoT) 추론에서 영감을 받아, 레이아웃을 사고로 간주하는 Layout-as-Thought(LaT)를 통해 웹페이지 디자인에서 코드 생성 시 레이아웃 보존을 강화하는 새로운 접근 방식인 LaTCoder를 제안합니다. 구체적으로, 먼저 웹페이지 디자인을 이미지 블록으로 나누는 간단하면서도 효율적인 알고리즘을 도입합니다. 다음으로, CoT 기반 접근 방식을 사용해 MLLM에게 각 블록에 대한 코드 생성을 요청합니다. 마지막으로, 절대 위치 지정과 MLLM 기반 방법이라는 두 가지 조립 전략을 적용한 후 동적 선택을 통해 최적의 출력을 결정합니다. 우리는 LaTCoder의 효과를 공개 벤치마크와 복잡한 레이아웃을 특징으로 하는 새롭게 도입된 더 어려운 벤치마크(CC-HARD)에서 여러 백본 MLLM(즉, DeepSeek-VL2, Gemini, GPT-4o)을 사용해 평가합니다. 자동 평가 지표에 대한 실험 결과는 상당한 개선을 보여줍니다. 특히, DeepSeek-VL2를 사용할 때 TreeBLEU 점수가 66.67% 증가하고 MAE가 38% 감소했으며, 이는 직접 프롬프트 방식과 비교한 결과입니다. 또한, 인간 선호도 평가 결과는 주석자들이 LaTCoder가 생성한 웹페이지를 60% 이상의 경우에서 선호한다는 것을 보여주며, 이는 우리 방법의 효과를 강력하게 뒷받침합니다.

English

Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.

LaTCoder: 레이아웃-생각 방식으로 웹페이지 디자인을 코드로 변환

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought

초록

Support