M3-AGIQA：多模态、多轮次、多面向的AI生成图像质量评估

摘要

AI生成圖像（AGI）模型的快速發展，在評估其品質方面帶來了重大挑戰，這需要考慮多個維度，如感知品質、提示對應性和真實性。為應對這些挑戰，我們提出了M3-AGIQA，一個多模態、多輪次、多方面的AGI品質評估綜合框架。我們的方法利用多模態大型語言模型（MLLMs）作為聯合文本和圖像編碼器，並通過低秩適應（LoRA）微調將線上MLLMs的高級描述能力蒸餾到本地模型中。該框架包含一個結構化的多輪評估機制，其中生成中間圖像描述以提供對品質、對應性和真實性方面的深入洞察。為了使預測與人類感知判斷一致，我們整合了一個由xLSTM和回歸頭構建的預測器，用於處理序列邏輯並預測平均意見分數（MOSs）。在多個基準數據集上進行的廣泛實驗表明，M3-AGIQA達到了最先進的性能，有效捕捉了AGI品質的細微差別。此外，跨數據集驗證證實了其強大的泛化能力。代碼可在https://github.com/strawhatboy/M3-AGIQA 獲取。

English

The rapid advancement of AI-generated image (AGI) models has introduced significant challenges in evaluating their quality, which requires considering multiple dimensions such as perceptual quality, prompt correspondence, and authenticity. To address these challenges, we propose M3-AGIQA, a comprehensive framework for AGI quality assessment that is Multimodal, Multi-Round, and Multi-Aspect. Our approach leverages the capabilities of Multimodal Large Language Models (MLLMs) as joint text and image encoders and distills advanced captioning capabilities from online MLLMs into a local model via Low-Rank Adaptation (LoRA) fine-tuning. The framework includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated to provide deeper insights into the quality, correspondence, and authenticity aspects. To align predictions with human perceptual judgments, a predictor constructed by an xLSTM and a regression head is incorporated to process sequential logits and predict Mean Opinion Scores (MOSs). Extensive experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance, effectively capturing nuanced aspects of AGI quality. Furthermore, cross-dataset validation confirms its strong generalizability. The code is available at https://github.com/strawhatboy/M3-AGIQA.

M3-AGIQA：多模态、多轮次、多面向的AI生成图像质量评估

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

摘要

Support