對抗性視頻推廣對抗文本到視頻檢索
Adversarial Video Promotion Against Text-to-Video Retrieval
August 9, 2025
作者: Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Shuai Liu, Chao Shen
cs.AI
摘要
得益於跨模態模型的發展,文本到視頻檢索(T2VR)正迅速進步,但其魯棒性仍大多未經檢驗。現有的針對T2VR的攻擊旨在將視頻推離查詢,即壓低視頻的排名,而將視頻拉向特定查詢的攻擊,即提升視頻的排名,則大多未被探索。這類攻擊可能更具影響力,因為攻擊者可能獲得更多觀看/點擊以謀取經濟利益並廣泛傳播(錯誤)信息。為此,我們率先提出了一種針對T2VR的攻擊,以對抗性地提升視頻排名,稱之為視頻提升攻擊(ViPro)。我們進一步提出了模態細化(MoRe),以捕捉視覺與文本模態之間更細粒度、更複雜的交互,從而增強黑盒可遷移性。全面的實驗涵蓋了2個現有基線、3個領先的T2VR模型、3個主流數據集,包含超過1萬個視頻,並在3種場景下進行評估。所有實驗均在多目標設置下進行,以反映攻擊者同時針對多個查詢提升視頻排名的現實場景。我們還評估了我們的攻擊在防禦和不可感知性方面的表現。總體而言,在白/灰/黑盒設置下,ViPro平均超越其他基線超過30/10/4%。我們的工作揭示了一個被忽視的脆弱性,提供了對攻擊上下限的定性分析,並為潛在的對策提供了見解。代碼將在https://github.com/michaeltian108/ViPro 公開提供。
English
Thanks to the development of cross-modal models, text-to-video retrieval
(T2VR) is advancing rapidly, but its robustness remains largely unexamined.
Existing attacks against T2VR are designed to push videos away from queries,
i.e., suppressing the ranks of videos, while the attacks that pull videos
towards selected queries, i.e., promoting the ranks of videos, remain largely
unexplored. These attacks can be more impactful as attackers may gain more
views/clicks for financial benefits and widespread (mis)information. To this
end, we pioneer the first attack against T2VR to promote videos adversarially,
dubbed the Video Promotion attack (ViPro). We further propose Modal Refinement
(MoRe) to capture the finer-grained, intricate interaction between visual and
textual modalities to enhance black-box transferability. Comprehensive
experiments cover 2 existing baselines, 3 leading T2VR models, 3 prevailing
datasets with over 10k videos, evaluated under 3 scenarios. All experiments are
conducted in a multi-target setting to reflect realistic scenarios where
attackers seek to promote the video regarding multiple queries simultaneously.
We also evaluated our attacks for defences and imperceptibility. Overall, ViPro
surpasses other baselines by over 30/10/4% for white/grey/black-box settings
on average. Our work highlights an overlooked vulnerability, provides a
qualitative analysis on the upper/lower bound of our attacks, and offers
insights into potential counterplays. Code will be publicly available at
https://github.com/michaeltian108/ViPro.