DOEI: 注意機構強化型クラス活性化マップのための埋め込み情報の二重最適化

要旨

弱教師ありセマンティックセグメンテーション（WSSS）では、通常、限られたセマンティックアノテーションを利用して初期のクラス活性化マップ（CAM）を取得します。しかし、高次元空間におけるクラス活性化応答とセマンティック情報の不十分な結合のため、CAMはオブジェクトの共起や活性化不足を引き起こしやすく、認識精度が低下する傾向にあります。この問題に対処するため、我々はDOEI（Dual Optimization of Embedding Information）を提案します。これは、セマンティックを意識したアテンション重み行列を通じて埋め込み表現を再構築し、埋め込み情報の表現能力を最適化する新しいアプローチです。具体的には、DOEIはクラスとパッチ間の相互作用において、高信頼度のトークンを増幅し、低信頼度のトークンを抑制します。これにより、活性化応答とセマンティック情報の整合性が強化され、ターゲット特徴の伝播と分離が促進され、生成された埋め込みが高次元セマンティック空間におけるターゲット特徴をより正確に表現できるようになります。さらに、DOEIではRGB値、埋め込み誘導特徴、および自己注意重みを組み合わせたハイブリッド特徴アライメントモジュールを提案し、候補トークンの信頼性を向上させます。包括的な実験により、DOEIが効果的なプラグアンドプレイモジュールであり、最先端の視覚トランスフォーマーベースのWSSSモデルを強化し、PASCAL VOC（+3.6%、+1.5%、+1.2% mIoU）やMS COCO（+1.2%、+1.6% mIoU）などの人気ベンチマークにおいてCAMの品質とセグメンテーション性能を大幅に向上させることが示されました。コードはhttps://github.com/AIGeeksGroup/DOEIで公開予定です。

English

Weakly supervised semantic segmentation (WSSS) typically utilizes limited semantic annotations to obtain initial Class Activation Maps (CAMs). However, due to the inadequate coupling between class activation responses and semantic information in high-dimensional space, the CAM is prone to object co-occurrence or under-activation, resulting in inferior recognition accuracy. To tackle this issue, we propose DOEI, Dual Optimization of Embedding Information, a novel approach that reconstructs embedding representations through semantic-aware attention weight matrices to optimize the expression capability of embedding information. Specifically, DOEI amplifies tokens with high confidence and suppresses those with low confidence during the class-to-patch interaction. This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space. In addition, we propose a hybrid-feature alignment module in DOEI that combines RGB values, embedding-guided features, and self-attention weights to increase the reliability of candidate tokens. Comprehensive experiments show that DOEI is an effective plug-and-play module that empowers state-of-the-art visual transformer-based WSSS models to significantly improve the quality of CAMs and segmentation performance on popular benchmarks, including PASCAL VOC (+3.6%, +1.5%, +1.2% mIoU) and MS COCO (+1.2%, +1.6% mIoU). Code will be available at https://github.com/AIGeeksGroup/DOEI.

DOEI: 注意機構強化型クラス活性化マップのための埋め込み情報の二重最適化

DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps

要旨

Support