频率动态卷积用于密集图像预测
Frequency Dynamic Convolution for Dense Image Prediction
March 24, 2025
作者: Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu
cs.AI
摘要
尽管动态卷积(DY-Conv)通过结合多个并行权重与注意力机制实现了自适应权重选择,展现出优异的性能,但这些权重的频率响应往往表现出高度相似性,导致高参数成本却适应性有限。本研究提出频率动态卷积(FDConv),一种在傅里叶域内学习固定参数预算的新方法,有效缓解了上述局限。FDConv将该预算划分为基于频率的组,各组拥有互不重叠的傅里叶索引,从而在不增加参数成本的前提下构建出频率多样化的权重。为进一步提升适应性,我们提出了核空间调制(KSM)和频带调制(FBM)。KSM在空间层面上动态调整每个滤波器的频率响应,而FBM则在频域内将权重分解为不同的频带,并根据局部内容进行动态调制。在目标检测、分割和分类任务上的大量实验验证了FDConv的有效性。我们展示,当应用于ResNet-50时,FDConv仅增加+3.6M参数便实现了更优性能,超越了需要大幅增加参数预算的先前方法(如CondConv +90M,KW +76.5M)。此外,FDConv能够无缝集成到包括ConvNeXt、Swin-Transformer在内的多种架构中,为现代视觉任务提供了一个灵活高效的解决方案。代码已公开于https://github.com/Linwei-Chen/FDConv。
English
While Dynamic Convolution (DY-Conv) has shown promising performance by
enabling adaptive weight selection through multiple parallel weights combined
with an attention mechanism, the frequency response of these weights tends to
exhibit high similarity, resulting in high parameter costs but limited
adaptability. In this work, we introduce Frequency Dynamic Convolution
(FDConv), a novel approach that mitigates these limitations by learning a fixed
parameter budget in the Fourier domain. FDConv divides this budget into
frequency-based groups with disjoint Fourier indices, enabling the construction
of frequency-diverse weights without increasing the parameter cost. To further
enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency
Band Modulation (FBM). KSM dynamically adjusts the frequency response of each
filter at the spatial level, while FBM decomposes weights into distinct
frequency bands in the frequency domain and modulates them dynamically based on
local content. Extensive experiments on object detection, segmentation, and
classification validate the effectiveness of FDConv. We demonstrate that when
applied to ResNet-50, FDConv achieves superior performance with a modest
increase of +3.6M parameters, outperforming previous methods that require
substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M).
Moreover, FDConv seamlessly integrates into a variety of architectures,
including ConvNeXt, Swin-Transformer, offering a flexible and efficient
solution for modern vision tasks. The code is made publicly available at
https://github.com/Linwei-Chen/FDConv.Summary
AI-Generated Summary