視覺化計數:利用視力正常用戶反饋構建面向視障人士的圖表描述數據集
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
March 17, 2025
作者: Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
cs.AI
摘要
通常,标注者群體與終端使用者群體的需求和視覺能力存在差異。為盲人和低視力(BLV)用戶生成詳細的圖表描述便是一個具有挑戰性的領域。視力正常的標注者能夠輕鬆描述視覺內容,但現有研究表明,由他們直接生成的描述成本高昂、易帶偏見,且根據BLV標準來看有所欠缺。在本研究中,我們要求視力正常的個體評估——而非生成——由視覺語言模型(VLM)通過多輪推理潛在監督引導生成的圖表描述。這些視力正常的評估結果被證明對本身為BLV並教授視障學生的專業教育工作者有效且有用。我們發布了Sightation,這是一個包含5千張圖表和13.7萬個樣本的圖表描述數據集,用於完成、偏好、檢索、問答和推理訓練目的,並展示了其在多種下游任務中的微調潛力。
English
Often, the needs and visual abilities differ between the annotator group and
the end user group. Generating detailed diagram descriptions for blind and
low-vision (BLV) users is one such challenging domain. Sighted annotators could
describe visuals with ease, but existing studies have shown that direct
generations by them are costly, bias-prone, and somewhat lacking by BLV
standards. In this study, we ask sighted individuals to assess -- rather than
produce -- diagram descriptions generated by vision-language models (VLM) that
have been guided with latent supervision via a multi-pass inference. The
sighted assessments prove effective and useful to professional educators who
are themselves BLV and teach visually impaired learners. We release Sightation,
a collection of diagram description datasets spanning 5k diagrams and 137k
samples for completion, preference, retrieval, question answering, and
reasoning training purposes and demonstrate their fine-tuning potential in
various downstream tasks.Summary
AI-Generated Summary