ChatPaper.aiChatPaper

UPLiFT:基于局部注意力机制的高效像素级密集特征上采样方法

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

January 25, 2026
作者: Matthew Walmer, Saksham Suri, Anirud Aggarwal, Abhinav Shrivastava
cs.AI

摘要

任务无关的特征上采样研究领域近年来崭露头角,其核心目标是通过预训练视觉骨干网络高效生成更密集的特征。这类方法通过学习将低分辨率特征映射至高分辨率版本,以远低于常规成本的方式实现密集特征提取。早期研究主要采用迭代式上采样策略,而近期工作则转向基于交叉注意力的方法,但后者可能陷入与待上采样骨干网络相似的效率瓶颈问题。本文证明迭代上采样方法仍可与基于交叉注意力的方案相媲美,且能以更低推理成本实现最优性能。我们提出UPLiFT——一种通用像素级轻量特征变换架构,并设计高效的局部注意力算子以克服传统迭代特征上采样方法的局限。该算子采用完全局部化的注意力池化公式,实验表明局部注意力机制使UPLiFT在上采样过程中保持特征稳定性,从而以低于现有像素级特征上采样器的推理成本达到最优性能。此外,我们将UPLiFT应用于生成式下游任务,证明其在VAE特征上采样任务中与最先进的耦合流匹配模型性能相当。总体而言,UPLiFT为生成密集特征提供了一种通用且高效的解决方案。
English
The space of task-agnostic feature upsampling has emerged as a promising area of research to efficiently create denser features from pre-trained visual backbones. These methods act as a shortcut to achieve dense features for a fraction of the cost by learning to map low-resolution features to high-resolution versions. While early works in this space used iterative upsampling approaches, more recent works have switched to cross-attention-based methods, which risk falling into the same efficiency scaling problems of the backbones they are upsampling. In this work, we demonstrate that iterative upsampling methods can still compete with cross-attention-based methods; moreover, they can achieve state-of-the-art performance with lower inference costs. We propose UPLiFT, an architecture for Universal Pixel-dense Lightweight Feature Transforms. We also propose an efficient Local Attender operator to overcome the limitations of prior iterative feature upsampling methods. This operator uses an alternative attentional pooling formulation defined fully locally. We show that our Local Attender allows UPLiFT to maintain stable features throughout upsampling, enabling state-of-the-art performance with lower inference costs than existing pixel-dense feature upsamplers. In addition, we apply UPLiFT to generative downstream tasks and show that it achieves competitive performance with state-of-the-art Coupled Flow Matching models for VAE feature upsampling. Altogether, UPLiFT offers a versatile and efficient approach to creating denser features.
PDF31January 30, 2026