DOEI: Dubbele Optimalisatie van Informatie-Inbedding voor Aandacht-Versterkte Klassenactivatiekaarten

Samenvatting

Zwak gesuperviseerde semantische segmentatie (WSSS) maakt doorgaans gebruik van beperkte semantische annotaties om initiële Class Activation Maps (CAM's) te verkrijgen. Echter, vanwege de ontoereikende koppeling tussen klasse-activatieresponsen en semantische informatie in een hoog-dimensionale ruimte, is de CAM gevoelig voor objectco-voorkomen of onderactivatie, wat resulteert in een inferieure herkenningsnauwkeurigheid. Om dit probleem aan te pakken, stellen we DOEI voor, Dual Optimization of Embedding Information, een nieuw benadering die insluitende representaties reconstrueert via semantisch-bewuste aandachtsgewichtsmatrices om de expressiemogelijkheid van insluitende informatie te optimaliseren. Specifiek versterkt DOEI tokens met hoge zekerheid en onderdrukt die met lage zekerheid tijdens de klasse-naar-patch interactie. Deze afstemming van activatieresponsen met semantische informatie versterkt de propagatie en ontkoppeling van doelfuncties, waardoor de gegenereerde insluitingen doelkenmerken nauwkeuriger kunnen vertegenwoordigen in een semantische ruimte op hoog niveau. Daarnaast stellen we een hybride-functieafstemmingsmodule voor in DOEI die RGB-waarden, insluiting-geleide functies en zelfaandachtsgewichten combineert om de betrouwbaarheid van kandidaat-tokens te vergroten. Uitgebreide experimenten tonen aan dat DOEI een effectieve plug-and-play module is die visual transformer-gebaseerde WSSS-modellen van de laatste stand van de techniek in staat stelt om aanzienlijk de kwaliteit van CAM's en segmentatieprestaties te verbeteren op populaire benchmarks, waaronder PASCAL VOC (+3,6%, +1,5%, +1,2% mIoU) en MS COCO (+1,2%, +1,6% mIoU). De code zal beschikbaar zijn op https://github.com/AIGeeksGroup/DOEI.

English

Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

DOEI: Dubbele Optimalisatie van Informatie-Inbedding voor Aandacht-Versterkte Klassenactivatiekaarten

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

Samenvatting

Support