ChatPaper.aiChatPaper

WikiWeb2M:一個以頁面為單位的多模態維基百科數據集

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

May 9, 2023
作者: Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
cs.AI

摘要

網頁一直是語言和視覺語言任務的豐富資源。然而,只有網頁的部分內容被保留:圖像標題對、長文本文章,或原始 HTML,從未同時存在於一個地方。因此,網頁任務受到了較少關注,並且結構化的圖像-文本數據被低估使用。為了研究多模態網頁理解,我們介紹了維基百科網頁 2M(WikiWeb2M)套件;這是第一個保留網頁中所有圖像、文本和結構數據的套件。WikiWeb2M可用於頁面描述生成、章節摘要和上下文圖像標題等任務。
English
Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.
PDF10December 15, 2024