ChatPaper.aiChatPaper

NAF:基于邻域注意力滤波的零样本特征上采样

NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

November 23, 2025
作者: Loick Chambon, Paul Couairon, Eloi Zablocki, Alexandre Boulch, Nicolas Thome, Matthieu Cord
cs.AI

摘要

视觉基础模型(VFMs)提取的空间下采样表征为像素级任务带来挑战。现有上采样方法面临根本性权衡:经典滤波器速度快、适用性广但依赖固定形式,而现代上采样器通过可学习的VFM专用形式实现更高精度,但需为每个VFM重新训练。我们提出邻域注意力滤波(NAF),通过跨尺度邻域注意力和旋转位置编码(RoPE)学习自适应空间-内容权重,仅以高分辨率输入图像为引导,成功弥合了这一差距。NAF具备零样本特性:无需重新训练即可上采样任意VFM的特征,成为首个超越VFM专用上采样器、在多个下游任务中实现最先进性能的VFM无关架构。该方法保持高效性,可扩展至2K特征图,并以18 FPS速度重建中间分辨率图谱。除特征上采样外,NAF在图像复原任务中同样表现优异,彰显其多用途潜力。代码与检查点已开源:https://github.com/valeoai/NAF。
English
Vision Foundation Models (VFMs) extract spatially downsampled representations, posing challenges for pixel-level tasks. Existing upsampling approaches face a fundamental trade-off: classical filters are fast and broadly applicable but rely on fixed forms, while modern upsamplers achieve superior accuracy through learnable, VFM-specific forms at the cost of retraining for each VFM. We introduce Neighborhood Attention Filtering (NAF), which bridges this gap by learning adaptive spatial-and-content weights through Cross-Scale Neighborhood Attention and Rotary Position Embeddings (RoPE), guided solely by the high-resolution input image. NAF operates zero-shot: it upsamples features from any VFM without retraining, making it the first VFM-agnostic architecture to outperform VFM-specific upsamplers and achieve state-of-the-art performance across multiple downstream tasks. It maintains high efficiency, scaling to 2K feature maps and reconstructing intermediate-resolution maps at 18 FPS. Beyond feature upsampling, NAF demonstrates strong performance on image restoration, highlighting its versatility. Code and checkpoints are available at https://github.com/valeoai/NAF.
PDF24December 1, 2025