ChatPaper.aiChatPaper

利用WebSight數據集解鎖將Web截圖轉換為HTML代碼

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

March 14, 2024
作者: Hugo Laurençon, Léo Tronchon, Victor Sanh
cs.AI

摘要

在網頁開發中使用視覺語言模型(VLMs)提供了一個有前途的策略,可以提高效率並解開無代碼解決方案的障礙:通過提供 UI 的截圖或草圖,VLM 可以生成代碼以重現它,例如在 HTML 這樣的語言中。儘管在各種任務上 VLMs 取得了進展,但將截圖轉換為對應的 HTML 的具體挑戰卻鮮少被探討。我們認為這主要是由於缺乏合適的高質量數據集所致。本研究介紹了 WebSight,這是一個由 200 萬對 HTML 代碼和它們對應的截圖組成的合成數據集。我們在我們的數據集上對基礎 VLM 進行微調,並展示了將網頁截圖轉換為功能性 HTML 代碼的能力。為了加速這一領域的研究,我們將 WebSight 開源。
English
Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

Summary

AI-Generated Summary

PDF564December 15, 2024