在多视角指代交流中对语言进行基础化
Grounding Language in Multi-Perspective Referential Communication
October 4, 2024
作者: Zineng Tang, Lingjun Mao, Alane Suhr
cs.AI
摘要
我们引入了一个任务和数据集,用于在多智能体实体环境中进行指代表达生成和理解。在这个任务中,共享场景中的两个智能体必须考虑彼此的视觉角度,这可能与它们自己的视角不同,以便产生和理解关于场景中物体及它们之间空间关系的指代。我们收集了一个包含2,970个人类编写的指代表达的数据集,每个表达都与人类理解判断配对,并评估了自动模型作为发言者和听众与人类伙伴配对时的表现,发现模型在指代生成和理解方面的表现都落后于人类智能体组合。最后,我们尝试训练一个开放权重的发言者模型,当与一个听众配对并表现出沟通成功的证据时,使沟通成功率从58.9%提高到69.3%,甚至胜过最强专有模型。
English
We introduce a task and dataset for referring expression generation and
comprehension in multi-agent embodied environments. In this task, two agents in
a shared scene must take into account one another's visual perspective, which
may be different from their own, to both produce and understand references to
objects in a scene and the spatial relations between them. We collect a dataset
of 2,970 human-written referring expressions, each paired with human
comprehension judgments, and evaluate the performance of automated models as
speakers and listeners paired with human partners, finding that model
performance in both reference generation and comprehension lags behind that of
pairs of human agents. Finally, we experiment training an open-weight speaker
model with evidence of communicative success when paired with a listener,
resulting in an improvement from 58.9 to 69.3% in communicative success and
even outperforming the strongest proprietary model.Summary
AI-Generated Summary