多模态大語言模型的自我改進：綜述

摘要

近期，大型語言模型（LLMs）自我改進領域的進展已顯著提升了模型能力，且未大幅增加成本，尤其是在人力投入方面。儘管這一領域仍相對年輕，但其向多模態領域的延伸展現了巨大潛力，能夠利用多樣化的數據源並開發出更為通用的自我改進模型。本調查首次全面概述了多模態大型語言模型（MLLMs）中的自我改進。我們從三個角度對當前文獻進行了結構化梳理並討論了相關方法：1）數據收集，2）數據組織，以及3）模型優化，以促進MLLMs自我改進的進一步發展。此外，我們還涵蓋了常用的評估方法和下游應用。最後，我們總結了開放性挑戰及未來的研究方向。

English

Recent advancements in self-improvement for Large Language Models (LLMs) have efficiently enhanced model capabilities without significantly increasing costs, particularly in terms of human effort. While this area is still relatively young, its extension to the multimodal domain holds immense potential for leveraging diverse data sources and developing more general self-improving models. This survey is the first to provide a comprehensive overview of self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview of the current literature and discuss methods from three perspectives: 1) data collection, 2) data organization, and 3) model optimization, to facilitate the further development of self-improvement in MLLMs. We also include commonly used evaluations and downstream applications. Finally, we conclude by outlining open challenges and future research directions.

多模态大語言模型的自我改進：綜述

Self-Improvement in Multimodal Large Language Models: A Survey

摘要

Support