ChatPaper.aiChatPaper

差异至关重要:用于能力差距发现与修正的模型审计

Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

December 18, 2025
作者: Qihao Liu, Chengzhi Mao, Yaojie Liu, Alan Yuille, Wen-Sheng Chu
cs.AI

摘要

传统多模态大语言模型(MLLM)的评估方法缺乏可解释性,往往难以充分揭示模型间的显著能力差距。为此,我们提出AuditDM——一种通过审计模型分歧主动发现并修正MLLM失效模式的自动化框架。该框架通过强化学习微调MLLM作为审计器,使其生成能最大化目标模型间分歧的挑战性问题和反事实图像。训练完成后,审计器可发掘出大量具有可解释性的典型样本,这些样本既能暴露模型缺陷,又可作为免标注数据用于模型修正。在Gemma-3和PaliGemma-2等前沿模型上的实验表明,AuditDM成功识别出20余种失效类型。基于这些发现进行微调后,所有模型在16个基准测试中均取得稳定提升,甚至使30亿参数模型反超其280亿参数版本。我们的研究证明,当数据扩展收益递减时,定向模型审计能为模型诊断与改进提供有效路径。
English
Conventional evaluation methods for multimodal LLMs (MLLMs) lack interpretability and are often insufficient to fully disclose significant capability gaps across models. To address this, we introduce AuditDM, an automated framework that actively discovers and rectifies MLLM failure modes by auditing their divergence. AuditDM fine-tunes an MLLM as an auditor via reinforcement learning to generate challenging questions and counterfactual images that maximize disagreement among target models. Once trained, the auditor uncovers diverse, interpretable exemplars that reveal model weaknesses and serve as annotation-free data for rectification. When applied to SoTA models like Gemma-3 and PaliGemma-2, AuditDM discovers more than 20 distinct failure types. Fine-tuning on these discoveries consistently improves all models across 16 benchmarks, and enables a 3B model to surpass its 28B counterpart. Our results suggest that as data scaling hits diminishing returns, targeted model auditing offers an effective path to model diagnosis and improvement.
PDF51December 20, 2025