MMPB:多模態個性化的時代已至
MMPB: It's Time for Multi-Modal Personalization
September 26, 2025
作者: Jaeik Kim, Woojin Kim, Woohyeon Park, Jaeyoung Do
cs.AI
摘要
視覺個性化在面向用戶的AI系統(如智能家居和醫療保健)中至關重要,這些系統需要將模型行為與以用戶為中心的概念對齊。然而,儘管最近的大型視覺-語言模型(VLMs)具有廣泛的適用性,但其適應個體用戶的能力仍未得到充分探索。本文介紹了MMPB,這是首個用於評估VLMs個性化能力的廣泛基準。MMPB包含10k個圖像-查詢對,涵蓋了人類、動物、物體和角色四個類別的111個可個性化概念,其中人類類別還包含了基於偏好的查詢。我們將個性化結構化為三個主要任務類型,每個類型都突出了VLMs的不同關鍵特性。通過使用23個廣泛使用的VLMs(包括開源和閉源模型),我們通過三階段協議評估了個性化性能:概念注入、多輪對話和個性化查詢。我們的研究結果表明,大多數VLMs(包括一些閉源模型)在個性化方面表現不佳,特別是在保持對話一致性、處理用戶偏好和適應視覺線索方面。我們的分析揭示了VLM個性化中的挑戰(如拒絕行為和長上下文遺忘),表明仍有很大的改進空間。通過識別這些限制並提供可擴展的基準,MMPB為未來真正個性化的多模態AI研究提供了寶貴的見解和堅實的基礎。項目頁面:aidaslab.github.io/MMPB
English
Visual personalization is essential in user-facing AI systems such as smart
homes and healthcare, where aligning model behavior with user-centric concepts
is critical. However, recent large Vision-Language Models (VLMs), despite their
broad applicability, remain underexplored in their ability to adapt to
individual users. In this paper, we introduce MMPB, the first extensive
benchmark for evaluating VLMs on personalization. MMPB comprises 10k
image-query pairs and includes 111 personalizable concepts across four
categories: humans, animals, objects, and characters, with the human category
enriched with preference-grounded queries. We structure personalization into
three main task types, each highlighting a different key property of VLMs.
Using 23 widely used VLMs including both open- and closed-source models, we
evaluate personalization performance via a three-stage protocol: concept
injection, multi-turn dialogue, and personalized querying. Our findings
indicate that most VLMs (including some closed-source models) struggle with
personalization, particularly in maintaining consistency over dialogue,
handling user preferences, and adapting to visual cues. Our analysis reveals
that the challenges in VLM personalization (such as refusal behaviors and
long-context forgetting) highlight substantial room for improvement. By
identifying these limitations and offering a scalable benchmark, MMPB offers
valuable insights and a solid foundation for future research toward truly
personalized multi-modal AI. Project Page: aidaslab.github.io/MMPB