理解檢索強健性對檢索增強型圖像標題生成的影響
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
June 4, 2024
作者: Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, Desmond Elliott
cs.AI
摘要
最近在檢索增強模型用於圖像標題生成方面取得的進展凸顯了檢索相關標題對於具有強大領域轉移能力的高效輕量級模型的好處。雖然這些模型展示了檢索增強的成功,但實際上檢索模型仍然遠非完美:檢索到的信息有時可能會誤導模型,導致生成錯誤和性能下降。本文分析了一個檢索增強的標題生成模型 SmallCap 的穩健性。我們的分析顯示,該模型對於出現在大多數檢索標題中的標記敏感,並且輸入歸因表明這些標記很可能被複製到生成的輸出中。鑒於這些發現,我們建議通過從更多不同集合中抽樣檢索標題來訓練模型。這樣可以降低模型學習複製大多數標記的機會,並改善領域內和跨領域的性能。
English
Recent advances in retrieval-augmented models for image captioning highlight
the benefit of retrieving related captions for efficient, lightweight models
with strong domain-transfer capabilities. While these models demonstrate the
success of retrieval augmentation, retrieval models are still far from perfect
in practice: the retrieved information can sometimes mislead the model,
resulting in incorrect generation and worse performance. In this paper, we
analyze the robustness of a retrieval-augmented captioning model SmallCap. Our
analysis shows that the model is sensitive to tokens that appear in the
majority of the retrieved captions, and the input attribution shows that those
tokens are likely copied into the generated output. Given these findings, we
propose to train the model by sampling retrieved captions from more diverse
sets. This decreases the chance that the model learns to copy majority tokens,
and improves both in-domain and cross-domain performance.Summary
AI-Generated Summary