ChatPaper.aiChatPaper

VolDoGer:LLM辅助数据集用于视觉-语言任务中的领域泛化

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks

July 29, 2024
作者: Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim
cs.AI

摘要

领域泛化是深度学习模型的一个关键方面,因为它决定了模型在未见领域数据上表现良好的能力。然而,针对视觉-语言任务的深度学习模型领域泛化的研究仍然有限,主要是因为缺乏所需的数据集。为了解决这些挑战,我们提出了VolDoGer:用于领域泛化的视觉-语言数据集,这是一个专门设计用于领域泛化的数据集,涵盖图像字幕生成、视觉问答和视觉蕴涵三个视觉-语言任务。我们通过将基于LLM的数据注释技术扩展到视觉-语言任务,构建了VolDoGer,从而减轻了招募人类标注者的负担。我们通过VolDoGer评估了各种模型的领域泛化能力,这些模型包括微调模型和最近的多模态大型语言模型。
English
Domain generalizability is a crucial aspect of a deep learning model since it determines the capability of the model to perform well on data from unseen domains. However, research on the domain generalizability of deep learning models for vision-language tasks remains limited, primarily because of the lack of required datasets. To address these challenges, we propose VolDoGer: Vision-Language Dataset for Domain Generalization, a dedicated dataset designed for domain generalization that addresses three vision-language tasks: image captioning, visual question answering, and visual entailment. We constructed VolDoGer by extending LLM-based data annotation techniques to vision-language tasks, thereby alleviating the burden of recruiting human annotators. We evaluated the domain generalizability of various models, ranging from fine-tuned models to a recent multimodal large language model, through VolDoGer.

Summary

AI-Generated Summary

PDF113November 28, 2024