视觉编码器中的处理与获取痕迹:CLIP对你的相机了解多少?
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
August 14, 2025
作者: Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia
cs.AI
摘要
先前的研究已分析了视觉编码器对图像变换和损坏的鲁棒性,特别是在训练过程中未见过此类改变的情况下。当这种情况发生时,它们会在测试时引入一种分布偏移,通常导致性能下降。这些研究主要关注的是那些严重损坏,当被激进应用时,会扭曲准确语义预测所需的有用信号。
我们则从不同角度出发,分析了图像获取过程中的参数以及那些可能细微甚至人眼难以察觉的变换。我们发现,这些参数被系统地编码在已学习的视觉表示中,并且可以轻易地被恢复。更为引人注目的是,它们的存在可能对语义预测产生深远影响,无论是正面还是负面。这种影响取决于语义标签与这些基于获取或处理的标签之间是否存在强相关性或反相关性。我们的代码和数据可在以下网址获取:https://github.com/ryan-caesar-ramos/visual-encoder-traces。
English
Prior work has analyzed the robustness of visual encoders to image
transformations and corruptions, particularly in cases where such alterations
are not seen during training. When this occurs, they introduce a form of
distribution shift at test time, often leading to performance degradation. The
primary focus has been on severe corruptions that, when applied aggressively,
distort useful signals necessary for accurate semantic predictions.
We take a different perspective by analyzing parameters of the image
acquisition process and transformations that may be subtle or even
imperceptible to the human eye. We find that such parameters are systematically
encoded in the learned visual representations and can be easily recovered. More
strikingly, their presence can have a profound impact, either positively or
negatively, on semantic predictions. This effect depends on whether there is a
strong correlation or anti-correlation between semantic labels and these
acquisition-based or processing-based labels. Our code and data are available
at: https://github.com/ryan-caesar-ramos/visual-encoder-traces