DOEI: 어텐션 강화를 위한 임베딩 정보의 이중 최적화를 위한 방법

초록

약하게 지도된 의미론적 분할 (WSSS)은 일반적으로 제한된 의미 주석을 활용하여 초기 Class Activation Maps (CAMs)을 획득합니다. 그러나 고차원 공간에서 클래스 활성화 응답과 의미 정보 간의 불충분한 결합으로 인해 CAM은 물체 공존 또는 미활성화로 취약해져 인식 정확도가 저하될 수 있습니다. 이 문제를 해결하기 위해 우리는 DOEI, Dual Optimization of Embedding Information을 제안합니다. 이는 임베딩 표현을 재구성하여 의미 인식 주의 가중치 행렬을 통해 임베딩 정보의 표현 능력을 최적화하는 혁신적인 방법입니다. 구체적으로, DOEI는 클래스-패치 상호작용 중에 높은 확신을 가진 토큰을 증폭시키고 낮은 확신을 가진 토큰을 억제합니다. 이러한 활성화 응답과 의미 정보의 조정은 대상 특징의 전파와 분리를 강화시켜 생성된 임베딩이 고수준 의미 공간에서 대상 특징을 더 정확하게 표현할 수 있도록 합니다. 또한, 우리는 RGB 값, 임베딩 가이드 특징 및 자기 주의 가중치를 결합하여 후보 토큰의 신뢰성을 높이는 하이브리드 특징 정렬 모듈을 DOEI에 제안합니다. 포괄적인 실험 결과는 DOEI가 최신 비주얼 트랜스포머 기반 WSSS 모델의 CAM 품질과 세분화 성능을 크게 향상시키는 효과적인 플러그 앤 플레이 모듈임을 보여줍니다. 이는 PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) 및 MS COCO (+1.2%, +1.6% mIoU)를 포함한 인기 있는 벤치마크에서 세그멘테이션 성능을 향상시킵니다. 코드는 https://github.com/AIGeeksGroup/DOEI에서 사용할 수 있습니다.

English

Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

DOEI: 어텐션 강화를 위한 임베딩 정보의 이중 최적화를 위한 방법

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

초록

Support