EmoCaliber:通过置信度言语化与校准推进可靠视觉情感理解
EmoCaliber: Advancing Reliable Visual Emotion Comprehension via Confidence Verbalization and Calibration
December 17, 2025
作者: Daiqing Wu, Dongbao Yang, Can Ma. Yu Zhou
cs.AI
摘要
视觉情感理解(VEC)旨在从图像中嵌入的情感线索推断情感极性或情绪类别。近年来,多模态大语言模型(MLLMs)通过其泛化能力统一了不同情感分类体系下的VEC任务,建立了该领域的主流范式。尽管这一范式取得显著成功,但通常将VEC视为确定性任务,要求模型为每张图像输出单一确定的情感标签。此类设定未能充分考虑情感感知固有的主观性,忽略了不同观者可能认为同样合理的替代性解读。为突破此局限,我们提出为MLLMs配备情感预测置信度表达能力。这种附加信号可同时为用户提供替代性解读的合理度估计与模型的自我能力评估,从而提升实际应用的可靠性。基于此洞见,我们设计了三阶段训练框架:逐步赋予结构化推理能力、教授置信度表达技巧、校准置信度表述,最终构建出面向VEC的置信度感知模型EmoCaliber。通过在统一基准VECBench上的公平全面评估,EmoCaliber在情感预测与置信度估计方面均展现出对现有方法的整体优势。这些结果验证了我们方法的有效性,并为构建更可靠的VEC系统迈出可行一步。项目页面:https://github.com/wdqqdw/EmoCaliber。
English
Visual Emotion Comprehension (VEC) aims to infer sentiment polarities or emotion categories from affective cues embedded in images. In recent years, Multimodal Large Language Models (MLLMs) have established a popular paradigm in VEC, leveraging their generalizability to unify VEC tasks defined under diverse emotion taxonomies. While this paradigm achieves notable success, it typically formulates VEC as a deterministic task, requiring the model to output a single, definitive emotion label for each image. Such a formulation insufficiently accounts for the inherent subjectivity of emotion perception, overlooking alternative interpretations that may be equally plausible to different viewers. To address this limitation, we propose equipping MLLMs with capabilities to verbalize their confidence in emotion predictions. This additional signal provides users with an estimate of both the plausibility of alternative interpretations and the MLLMs' self-assessed competence, thereby enhancing reliability in practice. Building on this insight, we introduce a three-stage training framework that progressively endows with structured reasoning, teaches to verbalize confidence, and calibrates confidence expression, culminating in EmoCaliber, a confidence-aware MLLM for VEC. Through fair and comprehensive evaluations on the unified benchmark VECBench, EmoCaliber demonstrates overall superiority against existing methods in both emotion prediction and confidence estimation. These results validate the effectiveness of our approach and mark a feasible step toward more reliable VEC systems. Project page: https://github.com/wdqqdw/EmoCaliber.