自適應頻率過濾器作為高效的全局代幣混合器。
Adaptive Frequency Filters As Efficient Global Token Mixers
July 26, 2023
作者: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo
cs.AI
摘要
最近的視覺轉換器、大核心卷積神經網絡和多層感知器在廣泛的視覺任務中取得了顯著的成功,這要歸功於它們在全局範圍內的有效信息融合。然而,它們的高效部署,尤其是在移動設備上,仍然面臨顯著挑戰,這是由於自注意機制、大核心或全連接層的高計算成本所致。在這項工作中,我們應用傳統卷積定理到深度學習中,以應對這一問題,並揭示自適應頻率濾波器可以作為高效的全局標記混合器。基於這一見解,我們提出了自適應頻率濾波(AFF)標記混合器。這種神經運算子通過傅立葉變換將潛在表示轉換到頻率域,並通過逐元素乘法執行語義自適應頻率濾波,從數學上等於在原始潛在空間中使用動態卷積核進行標記混合操作,其尺寸與該潛在表示的空間分辨率一樣大。我們將AFF標記混合器作為主要神經運算子來構建一個輕量級神經網絡,名為AFFNet。大量實驗證明了我們提出的AFF標記混合器的有效性,並顯示AFFNet在廣泛的視覺任務上,包括視覺識別和密集預測任務,實現了優越的準確性和效率折衷,相較於其他輕量級網絡設計。
English
Recent vision transformers, large-kernel CNNs and MLPs have attained
remarkable successes in broad vision tasks thanks to their effective
information fusion in the global scope. However, their efficient deployments,
especially on mobile devices, still suffer from noteworthy challenges due to
the heavy computational costs of self-attention mechanisms, large kernels, or
fully connected layers. In this work, we apply conventional convolution theorem
to deep learning for addressing this and reveal that adaptive frequency filters
can serve as efficient global token mixers. With this insight, we propose
Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers
a latent representation to the frequency domain via a Fourier transform and
performs semantic-adaptive frequency filtering via an elementwise
multiplication, which mathematically equals to a token mixing operation in the
original latent space with a dynamic convolution kernel as large as the spatial
resolution of this latent representation. We take AFF token mixers as primary
neural operators to build a lightweight neural network, dubbed AFFNet.
Extensive experiments demonstrate the effectiveness of our proposed AFF token
mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs
compared to other lightweight network designs on broad visual tasks, including
visual recognition and dense prediction tasks.