ChatPaper.aiChatPaper

基于梯度注意力引导的双掩码协同框架,实现鲁棒的文本行人检索

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

September 11, 2025
作者: Tianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang, Qichuan Ding
cs.AI

摘要

尽管对比语言-图像预训练(CLIP)在多种视觉任务中展现出卓越性能,但其在人物表征学习中的应用面临两大关键挑战:(一)专注于人物中心图像的大规模标注视觉-语言数据稀缺;(二)全局对比学习固有的局限性,难以在保持对细粒度匹配至关重要的局部特征的同时,易受噪声文本标记的影响。本研究通过数据构建与模型架构的协同优化,推动了CLIP在人物表征学习中的进步。首先,我们开发了一种抗噪数据构建流程,利用多模态大模型(MLLMs)的上下文学习能力,自动筛选并标注网络来源的图像,从而创建了WebPerson——一个包含500万高质量人物中心图像-文本对的大规模数据集。其次,我们提出了梯度注意力引导的双掩码协同(GA-DMS)框架,该框架基于梯度-注意力相似度评分自适应地掩码噪声文本标记,提升了跨模态对齐效果。此外,我们引入了掩码标记预测目标,迫使模型预测信息丰富的文本标记,从而增强了细粒度语义表征学习。大量实验表明,GA-DMS在多个基准测试中均达到了最先进的性能水平。
English
Although Contrastive Language-Image Pre-training (CLIP) exhibits strong performance across diverse vision tasks, its application to person representation learning faces two critical challenges: (i) the scarcity of large-scale annotated vision-language data focused on person-centric images, and (ii) the inherent limitations of global contrastive learning, which struggles to maintain discriminative local features crucial for fine-grained matching while remaining vulnerable to noisy text tokens. This work advances CLIP for person representation learning through synergistic improvements in data curation and model architecture. First, we develop a noise-resistant data construction pipeline that leverages the in-context learning capabilities of MLLMs to automatically filter and caption web-sourced images. This yields WebPerson, a large-scale dataset of 5M high-quality person-centric image-text pairs. Second, we introduce the GA-DMS (Gradient-Attention Guided Dual-Masking Synergetic) framework, which improves cross-modal alignment by adaptively masking noisy textual tokens based on the gradient-attention similarity score. Additionally, we incorporate masked token prediction objectives that compel the model to predict informative text tokens, enhancing fine-grained semantic representation learning. Extensive experiments show that GA-DMS achieves state-of-the-art performance across multiple benchmarks.
PDF62September 12, 2025