EgoNormia:物理社交規範理解的基準測試
EgoNormia: Benchmarking Physical Social Norm Understanding
February 27, 2025
作者: MohammadHossein Rezaei, Yicheng Fu, Phil Cuvin, Caleb Ziems, Yanzhe Zhang, Hao Zhu, Diyi Yang
cs.AI
摘要
人類行為受規範所調節。在現實世界中進行活動時,人類不僅遵循規範,還會考慮不同規範之間的權衡。然而,機器在訓練時往往缺乏對規範理解與推理的明確指導,尤其是當這些規範植根於物理和社會情境中時。為了提升並評估視覺語言模型(VLMs)的規範推理能力,我們提出了EgoNormia |ε|,該數據集包含1,853段以自我為中心的人類互動視頻,每段視頻均配有兩個相關問題,用於評估對規範行為的預測與合理性解釋。這些規範行為涵蓋七大類別:安全、隱私、空間距離、禮貌、合作、協調/主動性以及溝通/清晰度。為了大規模編制此數據集,我們提出了一種新穎的流程,結合了視頻採樣、自動答案生成、過濾及人工驗證。我們的研究表明,當前最先進的視覺語言模型在規範理解方面存在明顯不足,在EgoNormia上的最高得分僅為45%(相比之下,人類基準為92%)。我們對各維度表現的分析揭示了在應用於現實世界代理時,安全、隱私方面的重大風險,以及合作與溝通能力的缺失。此外,我們還展示了通過基於檢索的生成方法,利用EgoNomia來增強視覺語言模型的規範推理能力是可行的。
English
Human activity is moderated by norms. When performing actions in the real
world, humans not only follow norms, but also consider the trade-off between
different norms However, machines are often trained without explicit
supervision on norm understanding and reasoning, especially when the norms are
grounded in a physical and social context. To improve and evaluate the
normative reasoning capability of vision-language models (VLMs), we present
EgoNormia |epsilon|, consisting of 1,853 ego-centric videos of human
interactions, each of which has two related questions evaluating both the
prediction and justification of normative actions. The normative actions
encompass seven categories: safety, privacy, proxemics, politeness,
cooperation, coordination/proactivity, and communication/legibility. To compile
this dataset at scale, we propose a novel pipeline leveraging video sampling,
automatic answer generation, filtering, and human validation. Our work
demonstrates that current state-of-the-art vision-language models lack robust
norm understanding, scoring a maximum of 45% on EgoNormia (versus a human bench
of 92%). Our analysis of performance in each dimension highlights the
significant risks of safety, privacy, and the lack of collaboration and
communication capability when applied to real-world agents. We additionally
show that through a retrieval-based generation method, it is possible to use
EgoNomia to enhance normative reasoning in VLMs.Summary
AI-Generated Summary