M3-AGIQA: マルチモーダル、マルチラウンド、マルチアスペクトAI生成画像品質評価

要旨

AI生成画像（AGI）モデルの急速な進展は、その品質評価において重要な課題を提起しており、知覚品質、プロンプト対応性、真正性など複数の次元を考慮する必要があります。これらの課題に対処するため、我々はMultimodal（マルチモーダル）、Multi-Round（マルチラウンド）、Multi-Aspect（マルチアスペクト）を特徴とする包括的なAGI品質評価フレームワーク「M3-AGIQA」を提案します。本アプローチでは、マルチモーダル大規模言語モデル（MLLM）をテキストと画像の共同エンコーダーとして活用し、オンラインMLLMの高度なキャプショニング能力をLow-Rank Adaptation（LoRA）ファインチューニングを通じてローカルモデルに蒸留します。このフレームワークは、中間的な画像記述を生成して品質、対応性、真正性の側面に関する深い洞察を提供する構造化されたマルチラウンド評価メカニズムを含みます。人間の知覚的判断と予測を整合させるため、xLSTMと回帰ヘッドで構成された予測器を組み込み、シーケンシャルなロジットを処理してMean Opinion Score（MOS）を予測します。複数のベンチマークデータセットで実施した広範な実験により、M3-AGIQAが最先端の性能を達成し、AGI品質の微妙な側面を効果的に捉えることが実証されました。さらに、クロスデータセット検証により、その強力な汎化能力が確認されています。コードはhttps://github.com/strawhatboy/M3-AGIQAで公開されています。

English

The rapid advancement of AI-generated image (AGI) models has introduced significant challenges in evaluating their quality, which requires considering multiple dimensions such as perceptual quality, prompt correspondence, and authenticity. To address these challenges, we propose M3-AGIQA, a comprehensive framework for AGI quality assessment that is Multimodal, Multi-Round, and Multi-Aspect. Our approach leverages the capabilities of Multimodal Large Language Models (MLLMs) as joint text and image encoders and distills advanced captioning capabilities from online MLLMs into a local model via Low-Rank Adaptation (LoRA) fine-tuning. The framework includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated to provide deeper insights into the quality, correspondence, and authenticity aspects. To align predictions with human perceptual judgments, a predictor constructed by an xLSTM and a regression head is incorporated to process sequential logits and predict Mean Opinion Scores (MOSs). Extensive experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance, effectively capturing nuanced aspects of AGI quality. Furthermore, cross-dataset validation confirms its strong generalizability. The code is available at https://github.com/strawhatboy/M3-AGIQA.

M3-AGIQA: マルチモーダル、マルチラウンド、マルチアスペクトAI生成画像品質評価

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

要旨

Support