ChatPaper.aiChatPaper

WikiWeb2M:一个基于页面级别的多模态维基百科数据集

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

May 9, 2023
作者: Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
cs.AI

摘要

网页一直是语言和视觉-语言任务的丰富资源。然而,只有网页的部分内容被保留下来:图像-标题对、长文本文章或原始HTML,从未同时存在于一个地方。因此,网页任务受到了较少关注,结构化的图像-文本数据被低估利用。为了研究多模态网页理解,我们引入了维基百科网页2M(WikiWeb2M)套件;这是第一个保留了页面中所有图像、文本和结构数据的套件。WikiWeb2M可用于诸如页面描述生成、章节总结和上下文图像标题等任务。
English
Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.
PDF10December 15, 2024