多模态LLM的链接上下文学习

摘要

在人类对话中，从上下文中学习新概念并提供适当回应的能力至关重要。尽管当前的多模态大型语言模型（MLLMs）和大型语言模型（LLMs）是在大规模数据集上训练的，但在无需训练的情况下识别未见过的图像或理解新概念仍然是一个挑战。上下文学习（ICL）探索了无需训练的少样本学习，模型被鼓励从有限任务中“学会学习”，并推广到未见过的任务。在这项工作中，我们提出了链接上下文学习（LCL），强调“从因果推理”，以增强MLLMs的学习能力。LCL超越了传统的ICL，通过明确加强支持集和查询集之间的因果关系。通过提供具有因果链接的示范，LCL引导模型不仅识别类比，还理解数据点之间的潜在因果关联，从而使MLLMs更有效地识别未见过的图像并理解新概念。为了促进对这种新方法的评估，我们引入了ISEKAI数据集，专门包含为链接上下文学习设计的未见生成图像标签对。大量实验证明，我们的LCL-MLLM展现出强大的链接上下文学习能力，能更好地适应新概念，胜过普通的MLLMs。代码和数据将在https://github.com/isekai-portal/Link-Context-Learning 上发布。

English

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.

多模态LLM的链接上下文学习

Link-Context Learning for Multimodal LLMs

摘要

Support