LaTCoder：基于布局思维将网页设计转化为代码

摘要

将网页设计转化为代码（设计到代码）在前端开发中扮演着至关重要的角色，它弥合了视觉设计与功能实现之间的鸿沟。尽管最近的多模态大语言模型（MLLMs）在设计到代码任务中展现了显著潜力，但在代码生成过程中往往难以准确保持布局。为此，我们借鉴人类认知中的思维链（CoT）推理，提出了LaTCoder，一种通过布局即思维（LaT）增强网页设计在代码生成中布局保持的新方法。具体而言，我们首先引入了一种简单而高效的算法，将网页设计分割为图像块。接着，我们采用基于CoT的方法提示MLLMs为每个块生成代码。最后，我们应用两种组装策略——绝对定位和基于MLLM的方法——随后通过动态选择确定最优输出。我们使用多种骨干MLLMs（即DeepSeek-VL2、Gemini和GPT-4o）在公开基准和新引入的更具挑战性的基准（CC-HARD，以复杂布局为特色）上评估了LaTCoder的有效性。自动指标上的实验结果显示显著提升，特别是使用DeepSeek-VL2时，TreeBLEU得分提高了66.67%，MAE降低了38%，相较于直接提示。此外，人类偏好评估结果表明，在超过60%的情况下，标注者更倾向于LaTCoder生成的网页，这为我们的方法有效性提供了有力证据。

English

Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.

LaTCoder：基于布局思维将网页设计转化为代码

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought

摘要

Support