MMPB:迈向多模态个性化新时代
MMPB: It's Time for Multi-Modal Personalization
September 26, 2025
作者: Jaeik Kim, Woojin Kim, Woohyeon Park, Jaeyoung Do
cs.AI
摘要
视觉个性化在面向用户的AI系统中至关重要,如智能家居和医疗保健领域,这些场景下将模型行为与以用户为中心的概念对齐极为关键。然而,尽管近期的大型视觉-语言模型(VLMs)具有广泛的应用潜力,其在适应个体用户方面的能力仍未被充分探索。本文中,我们推出了MMPB,首个用于评估VLMs个性化能力的广泛基准。MMPB包含10,000个图像-查询对,涵盖人类、动物、物体和角色四大类别的111个可个性化概念,其中人类类别特别融入了基于偏好的查询。我们将个性化任务划分为三种主要类型,每种类型突出VLMs的一个关键特性。通过采用包括开源和闭源模型在内的23种广泛使用的VLMs,我们采用三阶段协议评估个性化性能:概念注入、多轮对话及个性化查询。研究结果显示,大多数VLMs(包括部分闭源模型)在个性化方面表现欠佳,尤其是在对话一致性维护、用户偏好处理及视觉线索适应上。我们的分析指出,VLMs个性化面临的挑战(如拒绝行为和长上下文遗忘)揭示了巨大的改进空间。通过识别这些局限并提供可扩展的基准,MMPB为未来实现真正个性化的多模态AI研究提供了宝贵的洞见和坚实基础。项目页面:aidaslab.github.io/MMPB
English
Visual personalization is essential in user-facing AI systems such as smart
homes and healthcare, where aligning model behavior with user-centric concepts
is critical. However, recent large Vision-Language Models (VLMs), despite their
broad applicability, remain underexplored in their ability to adapt to
individual users. In this paper, we introduce MMPB, the first extensive
benchmark for evaluating VLMs on personalization. MMPB comprises 10k
image-query pairs and includes 111 personalizable concepts across four
categories: humans, animals, objects, and characters, with the human category
enriched with preference-grounded queries. We structure personalization into
three main task types, each highlighting a different key property of VLMs.
Using 23 widely used VLMs including both open- and closed-source models, we
evaluate personalization performance via a three-stage protocol: concept
injection, multi-turn dialogue, and personalized querying. Our findings
indicate that most VLMs (including some closed-source models) struggle with
personalization, particularly in maintaining consistency over dialogue,
handling user preferences, and adapting to visual cues. Our analysis reveals
that the challenges in VLM personalization (such as refusal behaviors and
long-context forgetting) highlight substantial room for improvement. By
identifying these limitations and offering a scalable benchmark, MMPB offers
valuable insights and a solid foundation for future research toward truly
personalized multi-modal AI. Project Page: aidaslab.github.io/MMPB