选择性对比学习用于弱监督功能定位

摘要

促进实体与物体的交互，需要准确识别出支持特定动作的功能部件。弱监督功能定位（WSAG）旨在模仿人类从第三人称演示中学习的方式，即人类无需像素级标注便能直观理解功能部件。为此，通常通过跨不同视角图像共享分类器，并结合部件发现过程的蒸馏策略来实现定位。然而，由于功能相关部件并非总是易于区分，模型主要依赖分类，往往关注与功能无关的类别特定模式。为克服这一局限，我们超越了孤立的部件级学习，引入了选择性原型和像素对比目标，根据可用信息的粒度，自适应地在部件和物体两个层面学习功能相关线索。首先，我们利用CLIP在自我中心（聚焦物体）和他人中心（第三人称示例）图像中找出与动作关联的物体。接着，通过交叉参考互补视角下发现的物体，我们在每个视角中挖掘出精确的部件级功能线索。通过持续学习区分功能相关区域与无关背景，我们的方法有效将激活从无关区域转向有意义的功能线索。实验结果验证了该方法的有效性。代码可在github.com/hynnsk/SelectiveCL获取。

English

Facilitating an entity's interaction with objects requires accurately identifying parts that afford specific actions. Weakly supervised affordance grounding (WSAG) seeks to imitate human learning from third-person demonstrations, where humans intuitively grasp functional parts without needing pixel-level annotations. To achieve this, grounding is typically learned using a shared classifier across images from different perspectives, along with distillation strategies incorporating part discovery process. However, since affordance-relevant parts are not always easily distinguishable, models primarily rely on classification, often focusing on common class-specific patterns that are unrelated to affordance. To address this limitation, we move beyond isolated part-level learning by introducing selective prototypical and pixel contrastive objectives that adaptively learn affordance-relevant cues at both the part and object levels, depending on the granularity of the available information. Initially, we find the action-associated objects in both egocentric (object-focused) and exocentric (third-person example) images by leveraging CLIP. Then, by cross-referencing the discovered objects of complementary views, we excavate the precise part-level affordance clues in each perspective. By consistently learning to distinguish affordance-relevant regions from affordance-irrelevant background context, our approach effectively shifts activation from irrelevant areas toward meaningful affordance cues. Experimental results demonstrate the effectiveness of our method. Codes are available at github.com/hynnsk/SelectiveCL.

选择性对比学习用于弱监督功能定位

Selective Contrastive Learning for Weakly Supervised Affordance Grounding

摘要

Support