다중모달 LLM을 위한 맥락 기반 학습

초록

새로운 개념을 맥락에서 학습하고 적절한 응답을 제공하는 능력은 인간 대화에서 필수적입니다. 현재의 다중모달 대형 언어 모델(MLLM)과 대형 언어 모델(LLM)이 대규모 데이터셋으로 훈련되었음에도 불구하고, 보지 못한 이미지를 인식하거나 새로운 개념을 훈련 없이 이해하는 것은 여전히 어려운 과제로 남아 있습니다. 인-컨텍스트 학습(ICL)은 훈련 없이 소수의 샘플로 학습하는 방법을 탐구하며, 모델이 제한된 작업에서 "학습하는 법을 배우고" 보지 못한 작업으로 일반화하도록 장려합니다. 본 연구에서는 MLLM의 학습 능력을 강화하기 위해 "원인과 결과로부터 추론"을 강조하는 링크-컨텍스트 학습(LCL)을 제안합니다. LCL은 전통적인 ICL을 넘어 지원 세트와 질의 세트 간의 인과 관계를 명시적으로 강화합니다. 인과적 연결을 포함한 데모를 제공함으로써, LCL은 모델이 유사성뿐만 아니라 데이터 포인트 간의 근본적인 인과적 연관성을 파악하도록 안내하며, 이를 통해 MLLM이 보지 못한 이미지를 인식하고 새로운 개념을 더 효과적으로 이해할 수 있게 합니다. 이 새로운 접근법의 평가를 용이하게 하기 위해, 링크-컨텍스트 학습을 위해 설계된 보지 못한 생성된 이미지-레이블 쌍으로 구성된 ISEKAI 데이터셋을 소개합니다. 광범위한 실험을 통해 우리의 LCL-MLLM이 기존 MLLM보다 새로운 개념에 대한 강력한 링크-컨텍스트 학습 능력을 보임을 확인했습니다. 코드와 데이터는 https://github.com/isekai-portal/Link-Context-Learning에서 공개될 예정입니다.

English

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.

다중모달 LLM을 위한 맥락 기반 학습

Link-Context Learning for Multimodal LLMs

초록

Support