ChatPaper.aiChatPaper

利用WebSight数据集解锁将Web截图转换为HTML代码

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

March 14, 2024
作者: Hugo Laurençon, Léo Tronchon, Victor Sanh
cs.AI

摘要

在Web开发中使用视觉-语言模型(VLMs)提出了一种有前途的策略,可以提高效率并解决无代码解决方案的障碍:通过提供 UI 的截图或草图,VLM 可以生成代码来复制它,例如用 HTML 这样的语言。尽管在各种任务上VLMs取得了进展,但将截图转换为相应的HTML的具体挑战却鲜有探讨。我们认为这主要是因为缺乏一个合适的、高质量的数据集。本文介绍了WebSight,这是一个由200万对HTML代码和它们对应的截图组成的合成数据集。我们在我们的数据集上对基础VLM进行微调,并展示了将网页截图转换为功能性HTML代码的能力。为了加速这一领域的研究,我们开源了WebSight。
English
Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

Summary

AI-Generated Summary

PDF564December 15, 2024