多模态大語言模型的自我改進:綜述
Self-Improvement in Multimodal Large Language Models: A Survey
October 3, 2025
作者: Shijian Deng, Kai Wang, Tianyu Yang, Harsh Singh, Yapeng Tian
cs.AI
摘要
近期,大型語言模型(LLMs)自我改進領域的進展已顯著提升了模型能力,且未大幅增加成本,尤其是在人力投入方面。儘管這一領域仍相對年輕,但其向多模態領域的延伸展現了巨大潛力,能夠利用多樣化的數據源並開發出更為通用的自我改進模型。本調查首次全面概述了多模態大型語言模型(MLLMs)中的自我改進。我們從三個角度對當前文獻進行了結構化梳理並討論了相關方法:1)數據收集,2)數據組織,以及3)模型優化,以促進MLLMs自我改進的進一步發展。此外,我們還涵蓋了常用的評估方法和下游應用。最後,我們總結了開放性挑戰及未來的研究方向。
English
Recent advancements in self-improvement for Large Language Models (LLMs) have
efficiently enhanced model capabilities without significantly increasing costs,
particularly in terms of human effort. While this area is still relatively
young, its extension to the multimodal domain holds immense potential for
leveraging diverse data sources and developing more general self-improving
models. This survey is the first to provide a comprehensive overview of
self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview
of the current literature and discuss methods from three perspectives: 1) data
collection, 2) data organization, and 3) model optimization, to facilitate the
further development of self-improvement in MLLMs. We also include commonly used
evaluations and downstream applications. Finally, we conclude by outlining open
challenges and future research directions.