擴展您的核心：在 ConvNets 中的大核心設計朝向通用表示形式

摘要

本文提出了在設計現代卷積神經網絡（ConvNets）中使用大型卷積核的範式。我們確定，採用少量大型卷積核，而非堆疊多個較小的卷積核，可以是一種優越的設計策略。我們的工作引入了一套針對大型卷積核ConvNets的架構設計指南，優化其效率和性能。我們提出了UniRepLKNet架構，提供了專門為大型卷積核ConvNets量身定制的系統架構設計原則，強調它們捕獲廣泛空間信息的獨特能力，而無需深度堆疊層。這導致一個模型，不僅在ImageNet準確度達到88.0％，ADE20K mIoU達到55.6％，COCO box AP達到56.4％，超越了其前身，還在各種模態（如時間序列預測、音頻、點雲和視頻識別）上展示了令人印象深刻的可擴展性和性能。這些結果表明，與視覺變換器相比，大型卷積核ConvNets具有更快的推理速度，顯示了其通用建模能力。我們的研究發現顯示，大型卷積核ConvNets具有更大的有效感受野和更高的形狀偏差，遠離較小卷積核CNN典型的紋理偏差。所有代碼和模型都可在https://github.com/AILab-CVC/UniRepLKNet 公開獲得，促進社區中進一步的研究和發展。

English

This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets). We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy. Our work introduces a set of architecture design guidelines for large-kernel ConvNets that optimize their efficiency and performance. We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large-kernel ConvNets, emphasizing their unique ability to capture extensive spatial information without deep layer stacking. This results in a model that not only surpasses its predecessors with an ImageNet accuracy of 88.0%, an ADE20K mIoU of 55.6%, and a COCO box AP of 56.4% but also demonstrates impressive scalability and performance on various modalities such as time-series forecasting, audio, point cloud, and video recognition. These results indicate the universal modeling abilities of large-kernel ConvNets with faster inference speed compared with vision transformers. Our findings reveal that large-kernel ConvNets possess larger effective receptive fields and a higher shape bias, moving away from the texture bias typical of smaller-kernel CNNs. All codes and models are publicly available at https://github.com/AILab-CVC/UniRepLKNet promoting further research and development in the community.

擴展您的核心：在 ConvNets 中的大核心設計朝向通用表示形式

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

摘要

Summary

Support

Support