WikiWeb2M: 페이지 단위 다중모드 위키피디아 데이터셋

초록

웹페이지는 언어 및 시각-언어 작업을 위한 풍부한 자원이 되어 왔습니다. 그러나 웹페이지의 일부만이 보존되어 왔습니다: 이미지-캡션 쌍, 긴 텍스트 기사, 또는 원시 HTML 등이 각각 따로 저장되어, 한곳에 모두 모아진 적은 없었습니다. 이로 인해 웹페이지 작업은 상대적으로 적은 관심을 받았고, 구조화된 이미지-텍스트 데이터는 충분히 활용되지 못했습니다. 다중 모드 웹페이지 이해를 연구하기 위해, 우리는 Wikipedia 웹페이지 2M(WikiWeb2M) 제품군을 소개합니다. 이는 페이지 내에서 사용 가능한 모든 이미지, 텍스트, 구조 데이터를 보존하는 최초의 데이터셋입니다. WikiWeb2M은 페이지 설명 생성, 섹션 요약, 문맥 기반 이미지 캡션 생성과 같은 작업에 사용될 수 있습니다.

English

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little attention and structured image-text data underused. To study multimodal webpage understanding, we introduce the Wikipedia Webpage 2M (WikiWeb2M) suite; the first to retain the full set of images, text, and structure data available in a page. WikiWeb2M can be used for tasks like page description generation, section summarization, and contextual image captioning.

WikiWeb2M: 페이지 단위 다중모드 위키피디아 데이터셋

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

초록

Support