STMI:基于分割引导的令牌调制与跨模态超图交互的多模态目标重识别
STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification
February 28, 2026
作者: Xingguo Xu, Zhanyu Liu, Weixiang Zhou, Yuansheng Gao, Junjie Cao, Yuhao Wang, Jixiang Luo, Dell Zhang
cs.AI
摘要
多模态目标重识别旨在利用不同模态间的互补信息来检索特定目标。然而,现有方法通常依赖硬令牌筛选或简单融合策略,易导致判别性特征丢失和背景干扰增强。为解决这些问题,我们提出STMI——一种新型多模态学习框架,包含三个核心组件:(1)分割引导特征调制模块通过SAM生成的掩码,利用可学习的注意力调制机制增强前景表征并抑制背景噪声;(2)语义令牌重分配模块采用可学习查询令牌与自适应重分配机制,在不丢弃任何令牌的前提下提取紧凑且信息丰富的表征;(3)跨模态超图交互模块构建跨模态统一超图以捕捉高阶语义关联。在公开基准数据集上的大量实验表明,我们提出的STMI框架在多模态重识别场景中具有显著的有效性和鲁棒性。
English
Multi-modal object Re-Identification (ReID) aims to exploit complementary information from different modalities to retrieve specific objects. However, existing methods often rely on hard token filtering or simple fusion strategies, which can lead to the loss of discriminative cues and increased background interference. To address these challenges, we propose STMI, a novel multi-modal learning framework consisting of three key components: (1) Segmentation-Guided Feature Modulation (SFM) module leverages SAM-generated masks to enhance foreground representations and suppress background noise through learnable attention modulation; (2) Semantic Token Reallocation (STR) module employs learnable query tokens and an adaptive reallocation mechanism to extract compact and informative representations without discarding any tokens; (3) Cross-Modal Hypergraph Interaction (CHI) module constructs a unified hypergraph across modalities to capture high-order semantic relationships. Extensive experiments on public benchmarks (i.e., RGBNT201, RGBNT100, and MSVR310) demonstrate the effectiveness and robustness of our proposed STMI framework in multi-modal ReID scenarios.