頻率動態卷積用於密集圖像預測

摘要

雖然動態卷積（DY-Conv）通過結合多個平行權重與注意力機制實現了自適應權重選擇，展現出優異的性能，但這些權重的頻率響應往往呈現高度相似性，導致參數成本高昂而適應性有限。在本研究中，我們提出了頻率動態卷積（FDConv），這是一種新穎的方法，通過在傅立葉域中學習固定的參數預算來緩解這些限制。FDConv將這一預算劃分為基於頻率的組，各組具有不相交的傅立葉索引，從而能夠在不增加參數成本的情況下構建頻率多樣化的權重。為了進一步增強適應性，我們提出了核空間調製（KSM）和頻帶調製（FBM）。KSM在空間層面上動態調整每個濾波器的頻率響應，而FBM則在頻域中將權重分解為不同的頻帶，並根據局部內容對其進行動態調製。在物體檢測、分割和分類上的大量實驗驗證了FDConv的有效性。我們證明，當應用於ResNet-50時，FDConv僅增加+3.6M參數即可實現卓越性能，超越了需要大幅增加參數預算的先前方法（例如，CondConv +90M，KW +76.5M）。此外，FDConv無縫整合到多種架構中，包括ConvNeXt和Swin-Transformer，為現代視覺任務提供了一種靈活且高效的解決方案。代碼已公開於https://github.com/Linwei-Chen/FDConv。

English

While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.

頻率動態卷積用於密集圖像預測

Frequency Dynamic Convolution for Dense Image Prediction

摘要

Support