多模态大语言模型的自我改进：综述

摘要

近期，大型语言模型（LLMs）自我改进领域的进展显著提升了模型能力，且未大幅增加成本，尤其是在人力投入方面。尽管这一领域尚属年轻，但其向多模态领域的扩展展现出巨大潜力，能够利用多样化的数据源，开发出更具通用性的自我改进模型。本综述首次全面概述了多模态大型语言模型（MLLMs）中的自我改进研究。我们系统梳理了现有文献，并从三个角度探讨了相关方法：1）数据收集，2）数据组织，以及3）模型优化，以促进MLLMs自我改进技术的进一步发展。此外，我们还涵盖了常用的评估方法和下游应用。最后，我们总结了当前面临的开放挑战及未来研究方向。

English

Recent advancements in self-improvement for Large Language Models (LLMs) have efficiently enhanced model capabilities without significantly increasing costs, particularly in terms of human effort. While this area is still relatively young, its extension to the multimodal domain holds immense potential for leveraging diverse data sources and developing more general self-improving models. This survey is the first to provide a comprehensive overview of self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview of the current literature and discuss methods from three perspectives: 1) data collection, 2) data organization, and 3) model optimization, to facilitate the further development of self-improvement in MLLMs. We also include commonly used evaluations and downstream applications. Finally, we conclude by outlining open challenges and future research directions.

多模态大语言模型的自我改进：综述

Self-Improvement in Multimodal Large Language Models: A Survey

摘要

Support