多模式LLM的鏈接上下文學習

摘要

在人類對話中，從上下文中學習新概念並提供適當回應的能力至關重要。儘管目前的多模態大型語言模型（MLLMs）和大型語言模型（LLMs）是在大規模數據集上訓練的，但在無需訓練的情況下識別未見圖像或理解新概念仍然是一個挑戰。上下文學習（ICL）探索無需訓練的少樣本學習，其中模型被鼓勵從有限任務中“學會學習”，並推廣到未見任務。在這項工作中，我們提出了聯結上下文學習（LCL），強調“從因果推理”，以增強MLLMs的學習能力。LCL超越了傳統的ICL，通過明確加強支持集和查詢集之間的因果關係。通過提供具有因果聯繫的示範，LCL引導模型不僅識別類比，還理解數據點之間的潛在因果聯繫，從而使MLLMs更有效地識別未見圖像並理解新概念。為了促進對這種新方法的評估，我們引入了ISEKAI數據集，該數據集僅包含為聯結上下文學習而生成的未見圖像標籤對。大量實驗表明，我們的LCL-MLLM展現出強大的聯結上下文學習能力，適用於新概念，勝過普通的MLLMs。代碼和數據將在https://github.com/isekai-portal/Link-Context-Learning 上發布。

English

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.

多模式LLM的鏈接上下文學習

Link-Context Learning for Multimodal LLMs

摘要

Support