ChatPaper.aiChatPaper

INTRA:交互关系感知的弱监督可供性 grounding

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

September 10, 2024
作者: Ji Ha Jang, Hoigi Seo, Se Young Chun
cs.AI

摘要

指示性表示了物体固有的潜在交互作用。对指示性的感知可以使智能体能够高效地在新环境中导航和交互。弱监督的指示性基础教导智能体指示性概念,而无需昂贵的像素级标注,而是使用外中心图像。尽管最近在弱监督的指示性基础方面取得了令人期待的结果,但仍然存在挑战,包括需要配对的外中心和自中心图像数据集,以及为单个物体进行多样指示性基础的复杂性。为了解决这些问题,我们提出了基于交互关系的弱监督指示性基础(INTRA)。与以往方法不同,INTRA将这一问题重新构建为表示学习,通过仅使用外中心图像进行对比学习来识别交互作用的独特特征,从而消除了对配对数据集的需求。此外,我们利用视觉-语言模型嵌入来灵活执行指示性基础,设计了文本条件的指示性地图生成,以反映交互关系进行对比学习,并通过文本同义词增强提高了鲁棒性。我们的方法在AGD20K、IIT-AFF、CAD和UMD等多样数据集上优于以往方法。此外,实验结果表明,我们的方法在合成图像/插图方面具有显著的领域可扩展性,并能够执行新型交互和物体的指示性基础。
English
Affordance denotes the potential interactions inherent in objects. The perception of affordance can enable intelligent agents to navigate and interact with new environments efficiently. Weakly supervised affordance grounding teaches agents the concept of affordance without costly pixel-level annotations, but with exocentric images. Although recent advances in weakly supervised affordance grounding yielded promising results, there remain challenges including the requirement for paired exocentric and egocentric image dataset, and the complexity in grounding diverse affordances for a single object. To address them, we propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA). Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only, eliminating the need for paired datasets. Moreover, we leverage vision-language model embeddings for performing affordance grounding flexibly with any text, designing text-conditioned affordance map generation to reflect interaction relationship for contrastive learning and enhancing robustness with our text synonym augmentation. Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD. Additionally, experimental results demonstrate that our method has remarkable domain scalability for synthesized images / illustrations and is capable of performing affordance grounding for novel interactions and objects.

Summary

AI-Generated Summary

PDF272November 16, 2024