多模态大语言模型的自我改进:综述
Self-Improvement in Multimodal Large Language Models: A Survey
October 3, 2025
作者: Shijian Deng, Kai Wang, Tianyu Yang, Harsh Singh, Yapeng Tian
cs.AI
摘要
近期,大型语言模型(LLMs)自我改进领域的进展显著提升了模型能力,且未大幅增加成本,尤其是在人力投入方面。尽管这一领域尚属年轻,但其向多模态领域的扩展展现出巨大潜力,能够利用多样化的数据源,开发出更具通用性的自我改进模型。本综述首次全面概述了多模态大型语言模型(MLLMs)中的自我改进研究。我们系统梳理了现有文献,并从三个角度探讨了相关方法:1)数据收集,2)数据组织,以及3)模型优化,以促进MLLMs自我改进技术的进一步发展。此外,我们还涵盖了常用的评估方法和下游应用。最后,我们总结了当前面临的开放挑战及未来研究方向。
English
Recent advancements in self-improvement for Large Language Models (LLMs) have
efficiently enhanced model capabilities without significantly increasing costs,
particularly in terms of human effort. While this area is still relatively
young, its extension to the multimodal domain holds immense potential for
leveraging diverse data sources and developing more general self-improving
models. This survey is the first to provide a comprehensive overview of
self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview
of the current literature and discuss methods from three perspectives: 1) data
collection, 2) data organization, and 3) model optimization, to facilitate the
further development of self-improvement in MLLMs. We also include commonly used
evaluations and downstream applications. Finally, we conclude by outlining open
challenges and future research directions.