ChatPaper.aiChatPaper

LaTCoder:以佈局為思維將網頁設計轉換為代碼

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought

August 5, 2025
作者: Yi Gui, Zhen Li, Zhongyi Zhang, Guohao Wang, Tianpeng Lv, Gaoyang Jiang, Yi Liu, Dongping Chen, Yao Wan, Hongyu Zhang, Wenbin Jiang, Xuanhua Shi, Hai Jin
cs.AI

摘要

將網頁設計轉換為代碼(設計到代碼)在用戶界面(UI)開發中扮演著至關重要的角色,它彌合了視覺設計與功能實現之間的鴻溝。儘管最近的多模態大型語言模型(MLLMs)在設計到代碼任務中展現出顯著的潛力,但它們在代碼生成過程中往往無法準確保留佈局。為此,我們從人類認知中的思維鏈(CoT)推理中汲取靈感,提出了LaTCoder,這是一種新穎的方法,通過佈局即思維(LaT)來增強網頁設計在代碼生成過程中的佈局保留。具體而言,我們首先引入了一種簡單而高效的算法,將網頁設計劃分為圖像塊。接著,我們使用基於CoT的方法提示MLLMs為每個圖像塊生成代碼。最後,我們應用兩種組裝策略——絕對定位和基於MLLM的方法——並通過動態選擇來確定最佳輸出。我們在多個骨幹MLLMs(即DeepSeek-VL2、Gemini和GPT-4o)上使用公開基準和新引入的更具挑戰性的基準(CC-HARD,其特點是複雜佈局)來評估LaTCoder的有效性。自動指標上的實驗結果顯示了顯著的改進。具體而言,使用DeepSeek-VL2時,TreeBLEU分數提高了66.67%,MAE降低了38%,相比於直接提示。此外,人類偏好評估結果表明,註釋者在超過60%的情況下更傾向於LaTCoder生成的網頁,這為我們方法的有效性提供了有力證據。
English
Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.
PDF152August 7, 2025