ChatPaper.aiChatPaper

INTRA:交互關係感知弱監督可負擔性基礎

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

September 10, 2024
作者: Ji Ha Jang, Hoigi Seo, Se Young Chun
cs.AI

摘要

「可支配性」指的是物體固有的潛在互動性。對可支配性的感知能夠讓智能代理有效地在新環境中導航和互動。弱監督式可支配性基礎教導代理理解可支配性的概念,而無需昂貴的像素級標註,而是利用外中心圖像。儘管最近在弱監督式可支配性基礎方面取得了令人鼓舞的成果,但仍然存在挑戰,包括需要配對的外中心和自中心圖像數據集,以及為單個物體確定各種可支配性的複雜性。為了應對這些挑戰,我們提出了基於互動關係感知的弱監督式可支配性基礎(INTRA)。與以往方法不同,INTRA將這個問題重新定義為通過僅使用外中心圖像進行對比學習來識別互動的獨特特徵的表示學習,從而消除了對配對數據集的需求。此外,我們利用視覺-語言模型嵌入來靈活執行可支配性基礎,設計了文本條件下的可支配性地圖生成,以反映對比學習的互動關係,並通過我們的文本同義詞擴充增強了韌性。我們的方法在各種數據集(如AGD20K、IIT-AFF、CAD和UMD)上優於以往方法。此外,實驗結果表明,我們的方法對於合成圖像/插圖具有顯著的領域可擴展性,能夠執行對新型互動和物體的可支配性基礎。
English
Affordance denotes the potential interactions inherent in objects. The perception of affordance can enable intelligent agents to navigate and interact with new environments efficiently. Weakly supervised affordance grounding teaches agents the concept of affordance without costly pixel-level annotations, but with exocentric images. Although recent advances in weakly supervised affordance grounding yielded promising results, there remain challenges including the requirement for paired exocentric and egocentric image dataset, and the complexity in grounding diverse affordances for a single object. To address them, we propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA). Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only, eliminating the need for paired datasets. Moreover, we leverage vision-language model embeddings for performing affordance grounding flexibly with any text, designing text-conditioned affordance map generation to reflect interaction relationship for contrastive learning and enhancing robustness with our text synonym augmentation. Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD. Additionally, experimental results demonstrate that our method has remarkable domain scalability for synthesized images / illustrations and is capable of performing affordance grounding for novel interactions and objects.

Summary

AI-Generated Summary

PDF272November 16, 2024