ChatPaper.aiChatPaper

EgoNormia:物理社交規範理解的基準測試

EgoNormia: Benchmarking Physical Social Norm Understanding

February 27, 2025
作者: MohammadHossein Rezaei, Yicheng Fu, Phil Cuvin, Caleb Ziems, Yanzhe Zhang, Hao Zhu, Diyi Yang
cs.AI

摘要

人類行為受規範所調節。在現實世界中進行活動時,人類不僅遵循規範,還會考慮不同規範之間的權衡。然而,機器在訓練時往往缺乏對規範理解與推理的明確指導,尤其是當這些規範植根於物理和社會情境中時。為了提升並評估視覺語言模型(VLMs)的規範推理能力,我們提出了EgoNormia |ε|,該數據集包含1,853段以自我為中心的人類互動視頻,每段視頻均配有兩個相關問題,用於評估對規範行為的預測與合理性解釋。這些規範行為涵蓋七大類別:安全、隱私、空間距離、禮貌、合作、協調/主動性以及溝通/清晰度。為了大規模編制此數據集,我們提出了一種新穎的流程,結合了視頻採樣、自動答案生成、過濾及人工驗證。我們的研究表明,當前最先進的視覺語言模型在規範理解方面存在明顯不足,在EgoNormia上的最高得分僅為45%(相比之下,人類基準為92%)。我們對各維度表現的分析揭示了在應用於現實世界代理時,安全、隱私方面的重大風險,以及合作與溝通能力的缺失。此外,我們還展示了通過基於檢索的生成方法,利用EgoNomia來增強視覺語言模型的規範推理能力是可行的。
English
Human activity is moderated by norms. When performing actions in the real world, humans not only follow norms, but also consider the trade-off between different norms However, machines are often trained without explicit supervision on norm understanding and reasoning, especially when the norms are grounded in a physical and social context. To improve and evaluate the normative reasoning capability of vision-language models (VLMs), we present EgoNormia |epsilon|, consisting of 1,853 ego-centric videos of human interactions, each of which has two related questions evaluating both the prediction and justification of normative actions. The normative actions encompass seven categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline leveraging video sampling, automatic answer generation, filtering, and human validation. Our work demonstrates that current state-of-the-art vision-language models lack robust norm understanding, scoring a maximum of 45% on EgoNormia (versus a human bench of 92%). Our analysis of performance in each dimension highlights the significant risks of safety, privacy, and the lack of collaboration and communication capability when applied to real-world agents. We additionally show that through a retrieval-based generation method, it is possible to use EgoNomia to enhance normative reasoning in VLMs.

Summary

AI-Generated Summary

PDF52March 3, 2025