M3-AGIQA: 다중 모달, 다중 라운드, 다중 측면 AI 생성 이미지 품질 평가

초록

AI 생성 이미지(AGI) 모델의 급속한 발전은 지각적 품질, 프롬프트 일치도, 진정성과 같은 다차원적 요소를 고려해야 하는 품질 평가에 상당한 도전을 야기하고 있습니다. 이러한 도전을 해결하기 위해, 우리는 다중 모드(Multimodal), 다중 라운드(Multi-Round), 다중 측면(Multi-Aspect)을 고려한 AGI 품질 평가를 위한 포괄적인 프레임워크인 M3-AGIQA를 제안합니다. 우리의 접근 방식은 다중 모드 대형 언어 모델(MLLMs)을 텍스트와 이미지 공동 인코더로 활용하고, 온라인 MLLMs의 고급 캡셔닝 기능을 로우 랭크 적응(LoRA) 미세 조정을 통해 로컬 모델로 전이합니다. 이 프레임워크는 중간 이미지 설명을 생성하여 품질, 일치도, 진정성 측면에 대한 깊은 통찰을 제공하는 구조화된 다중 라운드 평가 메커니즘을 포함합니다. 예측을 인간의 지각적 판단과 일치시키기 위해, xLSTM과 회귀 헤드로 구성된 예측기가 순차적 로짓을 처리하고 평균 의견 점수(MOSs)를 예측합니다. 여러 벤치마크 데이터셋에서 수행된 광범위한 실험을 통해 M3-AGIQA가 AGI 품질의 미묘한 측면을 효과적으로 포착하며 최첨단 성능을 달성함을 입증했습니다. 또한, 교차 데이터셋 검증을 통해 강력한 일반화 능력을 확인했습니다. 코드는 https://github.com/strawhatboy/M3-AGIQA에서 확인할 수 있습니다.

English

The rapid advancement of AI-generated image (AGI) models has introduced significant challenges in evaluating their quality, which requires considering multiple dimensions such as perceptual quality, prompt correspondence, and authenticity. To address these challenges, we propose M3-AGIQA, a comprehensive framework for AGI quality assessment that is Multimodal, Multi-Round, and Multi-Aspect. Our approach leverages the capabilities of Multimodal Large Language Models (MLLMs) as joint text and image encoders and distills advanced captioning capabilities from online MLLMs into a local model via Low-Rank Adaptation (LoRA) fine-tuning. The framework includes a structured multi-round evaluation mechanism, where intermediate image descriptions are generated to provide deeper insights into the quality, correspondence, and authenticity aspects. To align predictions with human perceptual judgments, a predictor constructed by an xLSTM and a regression head is incorporated to process sequential logits and predict Mean Opinion Scores (MOSs). Extensive experiments conducted on multiple benchmark datasets demonstrate that M3-AGIQA achieves state-of-the-art performance, effectively capturing nuanced aspects of AGI quality. Furthermore, cross-dataset validation confirms its strong generalizability. The code is available at https://github.com/strawhatboy/M3-AGIQA.

M3-AGIQA: 다중 모달, 다중 라운드, 다중 측면 AI 생성 이미지 품질 평가

M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment

초록

Support