ChatPaper.aiChatPaper

DOEI:嵌入資訊的雙重優化用於注意力增強的類別激活映射

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

February 21, 2025
作者: Hongjie Zhu, Zeyu Zhang, Guansong Pang, Xu Wang, Shimin Wen, Yu Bai, Daji Ergu, Ying Cai, Yang Zhao
cs.AI

摘要

弱監督語義分割(WSSS)通常利用有限的語義註釋來獲取初始的類別激活圖(CAM)。然而,由於高維空間中類別激活響應與語義信息之間的耦合不足,CAM容易出現對象共現或激活不足的問題,導致識別精度不佳。為解決這一問題,我們提出了DOEI(雙重優化嵌入信息),這是一種新穎的方法,通過語義感知的注意力權重矩陣重構嵌入表示,以優化嵌入信息的表達能力。具體而言,DOEI在類別到圖塊的交互過程中,放大高置信度的標記並抑制低置信度的標記。這種激活響應與語義信息的對齊,增強了目標特徵的傳播與解耦,使生成的嵌入能夠更準確地表示高層語義空間中的目標特徵。此外,我們在DOEI中提出了一種混合特徵對齊模塊,結合了RGB值、嵌入引導特徵和自注意力權重,以提高候選標記的可靠性。全面的實驗表明,DOEI是一個有效的即插即用模塊,能夠顯著提升基於視覺Transformer的WSSS模型在流行基準測試(包括PASCAL VOC(+3.6%、+1.5%、+1.2% mIoU)和MS COCO(+1.2%、+1.6% mIoU))上的CAM質量和分割性能。代碼將在https://github.com/AIGeeksGroup/DOEI上提供。
English
Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

Summary

AI-Generated Summary

PDF32February 27, 2025