ChatPaper.aiChatPaper

鏡中奇遇:怪異圖像的常識一致性評估

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

May 12, 2025
作者: Elisei Rykov, Kseniia Petrushina, Kseniia Titova, Anton Razzhigaev, Alexander Panchenko, Vasily Konovalov
cs.AI

摘要

衡量真实图像的观感在人工智能研究中是一项复杂的任务。例如,一张描绘男孩在沙漠中手持吸尘器的图片便违背了常识。我们引入了一种新颖的方法,称之为“透过镜子看”(Through the Looking Glass, TLG),该方法利用大型视觉-语言模型(Large Vision-Language Models, LVLMs)和基于Transformer的编码器来评估图像的常识一致性。通过运用LVLMs从这些图像中提取原子事实,我们获得了一系列准确事实的混合体。随后,我们在编码后的原子事实上微调了一个紧凑的注意力池化分类器。我们的TLG方法在WHOOPS!和WEIRD数据集上实现了新的最先进性能,同时充分利用了紧凑的微调组件。
English
Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.

Summary

AI-Generated Summary

PDF132May 20, 2025