ChatPaper.aiChatPaper

LaTCoder:基于布局思维将网页设计转化为代码

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought

August 5, 2025
作者: Yi Gui, Zhen Li, Zhongyi Zhang, Guohao Wang, Tianpeng Lv, Gaoyang Jiang, Yi Liu, Dongping Chen, Yao Wan, Hongyu Zhang, Wenbin Jiang, Xuanhua Shi, Hai Jin
cs.AI

摘要

将网页设计转化为代码(设计到代码)在前端开发中扮演着至关重要的角色,它弥合了视觉设计与功能实现之间的鸿沟。尽管最近的多模态大语言模型(MLLMs)在设计到代码任务中展现了显著潜力,但在代码生成过程中往往难以准确保持布局。为此,我们借鉴人类认知中的思维链(CoT)推理,提出了LaTCoder,一种通过布局即思维(LaT)增强网页设计在代码生成中布局保持的新方法。具体而言,我们首先引入了一种简单而高效的算法,将网页设计分割为图像块。接着,我们采用基于CoT的方法提示MLLMs为每个块生成代码。最后,我们应用两种组装策略——绝对定位和基于MLLM的方法——随后通过动态选择确定最优输出。我们使用多种骨干MLLMs(即DeepSeek-VL2、Gemini和GPT-4o)在公开基准和新引入的更具挑战性的基准(CC-HARD,以复杂布局为特色)上评估了LaTCoder的有效性。自动指标上的实验结果显示显著提升,特别是使用DeepSeek-VL2时,TreeBLEU得分提高了66.67%,MAE降低了38%,相较于直接提示。此外,人类偏好评估结果表明,在超过60%的情况下,标注者更倾向于LaTCoder生成的网页,这为我们的方法有效性提供了有力证据。
English
Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.
PDF152August 7, 2025