カーネルのスケーリングアップ：ConvNetsにおける大規模カーネル設計に向けて普遍的表現へ

要旨

本論文では、現代の畳み込みニューラルネットワーク（ConvNets）の設計において大規模な畳み込みカーネルのパラダイムを提案しています。複数の小さなカーネルを積み重ねる代わりに、数個の大規模なカーネルを使用することが優れた設計戦略であることを確立しています。本研究では、大規模なカーネルを用いたConvNetsのための効率とパフォーマンスを最適化するための設計ガイドラインを導入しています。UniRepLKNetアーキテクチャを提案し、大規模なカーネルConvNets向けに特別に作成された体系的なアーキテクチャ設計原則を強調し、深い層の積み重ねを必要とせずに広範囲な空間情報を捉える能力を特に強調しています。これにより、ImageNetの精度が88.0％、ADE20K mIoUが55.6％、COCOボックスAPが56.4％という数値を達成し、さらに時系列予測、音声、ポイントクラウド、ビデオ認識などのさまざまなモダリティにおいて印象的な拡張性とパフォーマンスを示します。これらの結果は、視覚トランスフォーマーと比較して高速な推論速度を持つ大規模なカーネルConvNetsの普遍的なモデリング能力を示しています。我々の調査結果は、大規模なカーネルConvNetsがより大きな効果的な受容野とより高い形状バイアスを持ち、より小さなカーネルCNNの典型的なテクスチャバイアスから離れていることを明らかにしています。すべてのコードとモデルはhttps://github.com/AILab-CVC/UniRepLKNetで公開されており、コミュニティ内でのさらなる研究と開発を促進しています。

English

This paper proposes the paradigm of large convolutional kernels in designing modern Convolutional Neural Networks (ConvNets). We establish that employing a few large kernels, instead of stacking multiple smaller ones, can be a superior design strategy. Our work introduces a set of architecture design guidelines for large-kernel ConvNets that optimize their efficiency and performance. We propose the UniRepLKNet architecture, which offers systematical architecture design principles specifically crafted for large-kernel ConvNets, emphasizing their unique ability to capture extensive spatial information without deep layer stacking. This results in a model that not only surpasses its predecessors with an ImageNet accuracy of 88.0%, an ADE20K mIoU of 55.6%, and a COCO box AP of 56.4% but also demonstrates impressive scalability and performance on various modalities such as time-series forecasting, audio, point cloud, and video recognition. These results indicate the universal modeling abilities of large-kernel ConvNets with faster inference speed compared with vision transformers. Our findings reveal that large-kernel ConvNets possess larger effective receptive fields and a higher shape bias, moving away from the texture bias typical of smaller-kernel CNNs. All codes and models are publicly available at https://github.com/AILab-CVC/UniRepLKNet promoting further research and development in the community.

カーネルのスケーリングアップ：ConvNetsにおける大規模カーネル設計に向けて普遍的表現へ

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

要旨

Summary

Support

Support