将视觉语言模型助手与个性化情境认知对齐

摘要

与人类普遍目标（如无害性和无幻觉性）对齐的视觉语言模型（VLMs）已成为人类处理视觉任务的重要助手。然而，即使在相同情境下，不同背景的人也会有不同的认知。因此，他们对VLM助手可能有个性化的期望。这凸显了将VLM助手与个性化情境认知对齐以满足现实世界辅助需求的紧迫性。为研究这一问题，我们首先通过基于社会学概念“角色集”来刻画个体，从而简化问题。接着，我们提出通过评估个体行为来检验个性化对齐是否实现。此外，我们构建了一个名为PCogAlignBench的基准测试，包含18,000个实例和20个具有不同角色集的个体。最后，我们提出了一个名为PCogAlign的框架，该框架构建了一个基于认知和行为的奖励模型，用于实现个性化对齐。实验结果和人类评估证明了PCogAlignBench的可靠性以及我们提出的PCogAlign的有效性。我们将在https://github.com/NLPGM/PCogAlign开源所构建的基准测试和代码。

English

Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM assistants with personalized situated cognition for real-world assistance. To study this problem, we first simplify it by characterizing individuals based on the sociological concept of Role-Set. Then, we propose to evaluate the individuals' actions to examine whether the personalized alignment is achieved. Further, we construct a benchmark named PCogAlignBench, which includes 18k instances and 20 individuals with different Role-Sets. Finally, we present a framework called PCogAlign, which constructs a cognition-aware and action-based reward model for personalized alignment. Experimental results and human evaluations demonstrate the reliability of the PCogAlignBench and the effectiveness of our proposed PCogAlign. We will open-source the constructed benchmark and code at https://github.com/NLPGM/PCogAlign.

将视觉语言模型助手与个性化情境认知对齐

Aligning VLM Assistants with Personalized Situated Cognition

摘要

Support