ChatPaper.aiChatPaper

MMPB:多模態個性化的時代已至

MMPB: It's Time for Multi-Modal Personalization

September 26, 2025
作者: Jaeik Kim, Woojin Kim, Woohyeon Park, Jaeyoung Do
cs.AI

摘要

視覺個性化在面向用戶的AI系統(如智能家居和醫療保健)中至關重要,這些系統需要將模型行為與以用戶為中心的概念對齊。然而,儘管最近的大型視覺-語言模型(VLMs)具有廣泛的適用性,但其適應個體用戶的能力仍未得到充分探索。本文介紹了MMPB,這是首個用於評估VLMs個性化能力的廣泛基準。MMPB包含10k個圖像-查詢對,涵蓋了人類、動物、物體和角色四個類別的111個可個性化概念,其中人類類別還包含了基於偏好的查詢。我們將個性化結構化為三個主要任務類型,每個類型都突出了VLMs的不同關鍵特性。通過使用23個廣泛使用的VLMs(包括開源和閉源模型),我們通過三階段協議評估了個性化性能:概念注入、多輪對話和個性化查詢。我們的研究結果表明,大多數VLMs(包括一些閉源模型)在個性化方面表現不佳,特別是在保持對話一致性、處理用戶偏好和適應視覺線索方面。我們的分析揭示了VLM個性化中的挑戰(如拒絕行為和長上下文遺忘),表明仍有很大的改進空間。通過識別這些限制並提供可擴展的基準,MMPB為未來真正個性化的多模態AI研究提供了寶貴的見解和堅實的基礎。項目頁面:aidaslab.github.io/MMPB
English
Visual personalization is essential in user-facing AI systems such as smart homes and healthcare, where aligning model behavior with user-centric concepts is critical. However, recent large Vision-Language Models (VLMs), despite their broad applicability, remain underexplored in their ability to adapt to individual users. In this paper, we introduce MMPB, the first extensive benchmark for evaluating VLMs on personalization. MMPB comprises 10k image-query pairs and includes 111 personalizable concepts across four categories: humans, animals, objects, and characters, with the human category enriched with preference-grounded queries. We structure personalization into three main task types, each highlighting a different key property of VLMs. Using 23 widely used VLMs including both open- and closed-source models, we evaluate personalization performance via a three-stage protocol: concept injection, multi-turn dialogue, and personalized querying. Our findings indicate that most VLMs (including some closed-source models) struggle with personalization, particularly in maintaining consistency over dialogue, handling user preferences, and adapting to visual cues. Our analysis reveals that the challenges in VLM personalization (such as refusal behaviors and long-context forgetting) highlight substantial room for improvement. By identifying these limitations and offering a scalable benchmark, MMPB offers valuable insights and a solid foundation for future research toward truly personalized multi-modal AI. Project Page: aidaslab.github.io/MMPB
PDF142September 30, 2025