ChatPaper.aiChatPaper

视觉计数:利用视力用户反馈构建面向视障人士的图表描述数据集

Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions

March 17, 2025
作者: Wan Ju Kang, Eunki Kim, Na Min An, Sangryul Kim, Haemin Choi, Ki Hoon Kwak, James Thorne
cs.AI

摘要

通常,标注者群体与终端用户群体的需求和视觉能力存在差异。为盲人和低视力(BLV)用户生成详细的图表描述便是这样一个具有挑战性的领域。视力正常的标注者虽能轻松描述视觉内容,但现有研究表明,他们直接生成的描述不仅成本高昂、易带偏见,且按BLV标准衡量仍有所欠缺。在本研究中,我们邀请视力正常者评估——而非直接生成——由视觉语言模型(VLM)通过多轮推理隐式监督引导生成的图表描述。这些视力正常者的评估结果对身为BLV并教授视障学习者的专业教育者而言,被证明是有效且实用的。我们发布了Sightation,一个包含5千张图表和13.7万样本的图表描述数据集集合,旨在支持完成、偏好、检索、问答及推理训练等多种用途,并展示了其在多种下游任务中的微调潜力。
English
Often, the needs and visual abilities differ between the annotator group and the end user group. Generating detailed diagram descriptions for blind and low-vision (BLV) users is one such challenging domain. Sighted annotators could describe visuals with ease, but existing studies have shown that direct generations by them are costly, bias-prone, and somewhat lacking by BLV standards. In this study, we ask sighted individuals to assess -- rather than produce -- diagram descriptions generated by vision-language models (VLM) that have been guided with latent supervision via a multi-pass inference. The sighted assessments prove effective and useful to professional educators who are themselves BLV and teach visually impaired learners. We release Sightation, a collection of diagram description datasets spanning 5k diagrams and 137k samples for completion, preference, retrieval, question answering, and reasoning training purposes and demonstrate their fine-tuning potential in various downstream tasks.

Summary

AI-Generated Summary

PDF72March 18, 2025