超分萬物:一個簡潔難超越的特徵上採樣基線模型
Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
November 20, 2025
作者: Minseok Seo, Mark Hamilton, Changick Kim
cs.AI
摘要
我们提出"超分万物"(Upsample Anything)——一种轻量级测试时优化框架,无需任何训练即可将低分辨率特征恢复为高分辨率像素级输出。尽管视觉基础模型在多样化下游任务中展现出强大的泛化能力,但其表征通常会被下采样14倍/16倍(如ViT),这限制了它们在像素级应用中的直接使用。现有特征上采样方法依赖于特定数据集的重新训练或繁重的隐式优化,制约了可扩展性与泛化能力。本框架通过简单的单图优化学习结合空间与色域线索的各向异性高斯核,有效衔接了高斯泼溅与联合双边上采样。所学核函数作为通用的边缘感知算子,可跨架构与模态无缝迁移,实现特征图、深度图或概率图的精确高分辨率重建。该方法处理224×224图像仅需约0.419秒,在语义分割、深度估计以及深度图/概率图上采样任务中均达到最先进性能。项目页面:https://seominseok0429.github.io/Upsample-Anything/
English
We present Upsample Anything, a lightweight test-time optimization (TTO) framework that restores low-resolution features to high-resolution, pixel-wise outputs without any training. Although Vision Foundation Models demonstrate strong generalization across diverse downstream tasks, their representations are typically downsampled by 14x/16x (e.g., ViT), which limits their direct use in pixel-level applications. Existing feature upsampling approaches depend on dataset-specific retraining or heavy implicit optimization, restricting scalability and generalization. Upsample Anything addresses these issues through a simple per-image optimization that learns an anisotropic Gaussian kernel combining spatial and range cues, effectively bridging Gaussian Splatting and Joint Bilateral Upsampling. The learned kernel acts as a universal, edge-aware operator that transfers seamlessly across architectures and modalities, enabling precise high-resolution reconstruction of features, depth, or probability maps. It runs in only approx0.419 s per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling. Project page: https://seominseok0429.github.io/Upsample-Anything/{https://seominseok0429.github.io/Upsample-Anything/}