将视觉语言模型助手与个性化情境认知对齐
Aligning VLM Assistants with Personalized Situated Cognition
June 1, 2025
作者: Yongqi Li, Shen Zhou, Xiaohu Li, Xin Miao, Jintao Wen, Mayi Xu, Jianhao Chen, Birong Pan, Hankun Kang, Yuanyuan Zhu, Ming Zhong, Tieyun Qian
cs.AI
摘要
与人类普遍目标(如无害性和无幻觉性)对齐的视觉语言模型(VLMs)已成为人类处理视觉任务的重要助手。然而,即使在相同情境下,不同背景的人也会有不同的认知。因此,他们对VLM助手可能有个性化的期望。这凸显了将VLM助手与个性化情境认知对齐以满足现实世界辅助需求的紧迫性。为研究这一问题,我们首先通过基于社会学概念“角色集”来刻画个体,从而简化问题。接着,我们提出通过评估个体行为来检验个性化对齐是否实现。此外,我们构建了一个名为PCogAlignBench的基准测试,包含18,000个实例和20个具有不同角色集的个体。最后,我们提出了一个名为PCogAlign的框架,该框架构建了一个基于认知和行为的奖励模型,用于实现个性化对齐。实验结果和人类评估证明了PCogAlignBench的可靠性以及我们提出的PCogAlign的有效性。我们将在https://github.com/NLPGM/PCogAlign开源所构建的基准测试和代码。
English
Vision-language models (VLMs) aligned with general human objectives, such as
being harmless and hallucination-free, have become valuable assistants of
humans in managing visual tasks. However, people with diversified backgrounds
have different cognition even in the same situation. Consequently, they may
have personalized expectations for VLM assistants. This highlights the urgent
need to align VLM assistants with personalized situated cognition for
real-world assistance. To study this problem, we first simplify it by
characterizing individuals based on the sociological concept of Role-Set. Then,
we propose to evaluate the individuals' actions to examine whether the
personalized alignment is achieved. Further, we construct a benchmark named
PCogAlignBench, which includes 18k instances and 20 individuals with different
Role-Sets. Finally, we present a framework called PCogAlign, which constructs a
cognition-aware and action-based reward model for personalized alignment.
Experimental results and human evaluations demonstrate the reliability of the
PCogAlignBench and the effectiveness of our proposed PCogAlign. We will
open-source the constructed benchmark and code at
https://github.com/NLPGM/PCogAlign.Summary
AI-Generated Summary