ChatPaper.aiChatPaper

SymDPO:基於符號示範直接偏好優化的大型多模態模型上下文學習效能提升

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

November 17, 2024
作者: Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
cs.AI

摘要

隨著語言模型規模持續擴大,大型語言模型(LLMs)在上下文學習(ICL)中展現出新興能力,能夠通過前置少量上下文示範(ICDs)作為背景來解決語言任務。受此進展啟發,研究人員擴展這些技術以開發具備ICL能力的大型多模態模型(LMMs)。然而,現有LMMs面臨一個關鍵問題:它們往往無法有效利用多模態示範中的視覺上下文,而僅是遵循文本模式。這表明LMMs未能實現多模態示範與模型輸出之間的有效對齊。為解決此問題,我們提出符號示範直接偏好優化(SymDPO)。具體而言,SymDPO旨在打破傳統多模態示範建構範式,透過隨機符號替換實例中的文本答案,迫使模型仔細理解示範圖像並建立圖像與符號之間的關聯以正確回答問題。我們在多個基準測試上驗證了該方法的有效性,結果表明採用SymDPO的LMMs能更有效地理解示例中的多模態上下文,並運用此知識更好地回答問題。
English
As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, existing LMMs face a critical issue: they often fail to effectively leverage the visual context in multimodal demonstrations and instead simply follow textual patterns. This indicates that LMMs do not achieve effective alignment between multimodal demonstrations and model outputs. To address this problem, we propose Symbol Demonstration Direct Preference Optimization (SymDPO). Specifically, SymDPO aims to break the traditional paradigm of constructing multimodal demonstrations by using random symbols to replace text answers within instances. This forces the model to carefully understand the demonstration images and establish a relationship between the images and the symbols to answer questions correctly. We validate the effectiveness of this method on multiple benchmarks, demonstrating that with SymDPO, LMMs can more effectively understand the multimodal context within examples and utilize this knowledge to answer questions better.
PDF223November 21, 2024