VolDoGer:在視覺-語言任務中協助領域泛化的LLM輔助數據集
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
July 29, 2024
作者: Juhwan Choi, Junehyoung Kwon, JungMin Yun, Seunguk Yu, YoungBin Kim
cs.AI
摘要
領域通用性是深度學習模型的一個關鍵方面,因為它決定了模型在來自未知領域的數據上表現良好的能力。然而,有關深度學習模型在視覺語言任務中的領域通用性的研究仍然有限,主要是因為缺乏必要的數據集。為應對這些挑戰,我們提出了VolDoGer:用於領域泛化的視覺語言數據集,這是一個專門設計用於解決三個視覺語言任務的數據集:圖像標題生成、視覺問答和視覺蘊涵。我們通過將基於LLM的數據標註技術擴展到視覺語言任務,從而減輕了招募人類標註者的負擔,構建了VolDoGer。我們通過VolDoGer評估了從微調模型到最近的多模式大型語言模型等各種模型的領域通用性。
English
Domain generalizability is a crucial aspect of a deep learning model since it
determines the capability of the model to perform well on data from unseen
domains. However, research on the domain generalizability of deep learning
models for vision-language tasks remains limited, primarily because of the lack
of required datasets. To address these challenges, we propose VolDoGer:
Vision-Language Dataset for Domain Generalization, a dedicated dataset designed
for domain generalization that addresses three vision-language tasks: image
captioning, visual question answering, and visual entailment. We constructed
VolDoGer by extending LLM-based data annotation techniques to vision-language
tasks, thereby alleviating the burden of recruiting human annotators. We
evaluated the domain generalizability of various models, ranging from
fine-tuned models to a recent multimodal large language model, through
VolDoGer.Summary
AI-Generated Summary