ChatPaper.aiChatPaper

《泛化上采样:一种简洁而强大的特征上采样基线方法》

Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling

November 20, 2025
作者: Minseok Seo, Mark Hamilton, Changick Kim
cs.AI

摘要

我们提出Upsample Anything,一种轻量级测试时优化框架,无需任何训练即可将低分辨率特征恢复为高分辨率像素级输出。尽管视觉基础模型在多样化下游任务中展现出强大的泛化能力,但其表征通常会被下采样14倍/16倍(如ViT),这限制了其在像素级应用中的直接使用。现有特征上采样方法依赖于数据集特定的重训练或繁重的隐式优化,制约了可扩展性和泛化能力。Upsample Anything通过简单的逐图像优化解决这些问题,该优化学习结合空间与范围信息的各向异性高斯核,有效衔接了高斯泼溅与联合双边上采样。学习得到的高斯核可作为通用、边缘感知的算子,无缝迁移于不同架构与模态,实现特征、深度或概率图的精确高分辨率重建。该方法处理224x224图像仅需约0.419秒,在语义分割、深度估计以及深度图与概率图上采样任务中均达到最先进性能。项目页面:https://seominseok0429.github.io/Upsample-Anything/
English
We present Upsample Anything, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only approx0.419 s per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling. Project page: https://seominseok0429.github.io/Upsample-Anything/{https://seominseok0429.github.io/Upsample-Anything/}
PDF72February 7, 2026