マルチモーダル大規模言語モデルにおける自己改善：サーベイ

要旨

大規模言語モデル（LLM）の自己改善に関する最近の進展は、特に人的コストを大幅に増加させることなく、モデルの能力を効率的に向上させてきた。この分野はまだ比較的新しいが、マルチモーダル領域への拡張は、多様なデータソースを活用し、より汎用的な自己改善モデルを開発するための大きな可能性を秘めている。本調査は、マルチモーダルLLM（MLLM）における自己改善について包括的な概観を提供する初めてのものである。我々は、現在の文献を体系的に概観し、1）データ収集、2）データ整理、3）モデル最適化という3つの観点から手法を議論し、MLLMにおける自己改善のさらなる発展を促進する。また、一般的に使用される評価方法と下流アプリケーションについても取り上げる。最後に、未解決の課題と今後の研究の方向性を概説して結論とする。

English

Recent advancements in self-improvement for Large Language Models (LLMs) have efficiently enhanced model capabilities without significantly increasing costs, particularly in terms of human effort. While this area is still relatively young, its extension to the multimodal domain holds immense potential for leveraging diverse data sources and developing more general self-improving models. This survey is the first to provide a comprehensive overview of self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview of the current literature and discuss methods from three perspectives: 1) data collection, 2) data organization, and 3) model optimization, to facilitate the further development of self-improvement in MLLMs. We also include commonly used evaluations and downstream applications. Finally, we conclude by outlining open challenges and future research directions.

マルチモーダル大規模言語モデルにおける自己改善：サーベイ

Self-Improvement in Multimodal Large Language Models: A Survey

要旨

Support