頻率動態卷積用於密集圖像預測
Frequency Dynamic Convolution for Dense Image Prediction
March 24, 2025
作者: Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu
cs.AI
摘要
雖然動態卷積(DY-Conv)通過結合多個平行權重與注意力機制實現了自適應權重選擇,展現出優異的性能,但這些權重的頻率響應往往呈現高度相似性,導致參數成本高昂而適應性有限。在本研究中,我們提出了頻率動態卷積(FDConv),這是一種新穎的方法,通過在傅立葉域中學習固定的參數預算來緩解這些限制。FDConv將這一預算劃分為基於頻率的組,各組具有不相交的傅立葉索引,從而能夠在不增加參數成本的情況下構建頻率多樣化的權重。為了進一步增強適應性,我們提出了核空間調製(KSM)和頻帶調製(FBM)。KSM在空間層面上動態調整每個濾波器的頻率響應,而FBM則在頻域中將權重分解為不同的頻帶,並根據局部內容對其進行動態調製。在物體檢測、分割和分類上的大量實驗驗證了FDConv的有效性。我們證明,當應用於ResNet-50時,FDConv僅增加+3.6M參數即可實現卓越性能,超越了需要大幅增加參數預算的先前方法(例如,CondConv +90M,KW +76.5M)。此外,FDConv無縫整合到多種架構中,包括ConvNeXt和Swin-Transformer,為現代視覺任務提供了一種靈活且高效的解決方案。代碼已公開於https://github.com/Linwei-Chen/FDConv。
English
While Dynamic Convolution (DY-Conv) has shown promising performance by
enabling adaptive weight selection through multiple parallel weights combined
with an attention mechanism, the frequency response of these weights tends to
exhibit high similarity, resulting in high parameter costs but limited
adaptability. In this work, we introduce Frequency Dynamic Convolution
(FDConv), a novel approach that mitigates these limitations by learning a fixed
parameter budget in the Fourier domain. FDConv divides this budget into
frequency-based groups with disjoint Fourier indices, enabling the construction
of frequency-diverse weights without increasing the parameter cost. To further
enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency
Band Modulation (FBM). KSM dynamically adjusts the frequency response of each
filter at the spatial level, while FBM decomposes weights into distinct
frequency bands in the frequency domain and modulates them dynamically based on
local content. Extensive experiments on object detection, segmentation, and
classification validate the effectiveness of FDConv. We demonstrate that when
applied to ResNet-50, FDConv achieves superior performance with a modest
increase of +3.6M parameters, outperforming previous methods that require
substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M).
Moreover, FDConv seamlessly integrates into a variety of architectures,
including ConvNeXt, Swin-Transformer, offering a flexible and efficient
solution for modern vision tasks. The code is made publicly available at
https://github.com/Linwei-Chen/FDConv.Summary
AI-Generated Summary