ChatPaper.aiChatPaper

对抗性视频推广对抗文本到视频检索

Adversarial Video Promotion Against Text-to-Video Retrieval

August 9, 2025
作者: Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Shuai Liu, Chao Shen
cs.AI

摘要

得益于跨模态模型的发展,文本到视频检索(T2VR)技术正迅速进步,但其鲁棒性尚未得到充分检验。现有的T2VR攻击主要旨在将视频推离查询,即降低视频的排名,而将视频拉向特定查询,即提升视频排名的攻击方式则鲜有研究。这类攻击可能更具影响力,因为攻击者可通过增加观看/点击量获取经济利益并广泛传播(错误)信息。为此,我们率先提出了一种针对T2VR的视频提升攻击,命名为视频推广攻击(ViPro)。我们进一步提出了模态精炼(MoRe)方法,以捕捉视觉与文本模态间更细粒度、复杂的交互,从而增强黑盒迁移能力。全面的实验涵盖了2个现有基线、3个领先的T2VR模型、3个主流数据集中的超过1万条视频,并在3种场景下进行评估。所有实验均在多目标设置下进行,以反映攻击者同时针对多个查询提升视频排名的现实场景。我们还评估了攻击的防御性和不可察觉性。总体而言,在白盒/灰盒/黑盒设置下,ViPro平均超越其他基线超过30%/10%/4%。我们的工作揭示了一个被忽视的漏洞,提供了攻击上下界的定性分析,并为潜在的对策提供了见解。代码将公开于https://github.com/michaeltian108/ViPro。
English
Thanks to the development of cross-modal models, text-to-video retrieval (T2VR) is advancing rapidly, but its robustness remains largely unexamined. Existing attacks against T2VR are designed to push videos away from queries, i.e., suppressing the ranks of videos, while the attacks that pull videos towards selected queries, i.e., promoting the ranks of videos, remain largely unexplored. These attacks can be more impactful as attackers may gain more views/clicks for financial benefits and widespread (mis)information. To this end, we pioneer the first attack against T2VR to promote videos adversarially, dubbed the Video Promotion attack (ViPro). We further propose Modal Refinement (MoRe) to capture the finer-grained, intricate interaction between visual and textual modalities to enhance black-box transferability. Comprehensive experiments cover 2 existing baselines, 3 leading T2VR models, 3 prevailing datasets with over 10k videos, evaluated under 3 scenarios. All experiments are conducted in a multi-target setting to reflect realistic scenarios where attackers seek to promote the video regarding multiple queries simultaneously. We also evaluated our attacks for defences and imperceptibility. Overall, ViPro surpasses other baselines by over 30/10/4% for white/grey/black-box settings on average. Our work highlights an overlooked vulnerability, provides a qualitative analysis on the upper/lower bound of our attacks, and offers insights into potential counterplays. Code will be publicly available at https://github.com/michaeltian108/ViPro.
PDF82August 13, 2025