ChatPaper.aiChatPaper

LSNet:見大識小,聚焦細微

LSNet: See Large, Focus Small

March 29, 2025
作者: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding
cs.AI

摘要

視覺網絡設計,包括卷積神經網絡和視覺變換器,已顯著推動了計算機視覺領域的發展。然而,它們複雜的計算為實際部署帶來了挑戰,特別是在實時應用中。為解決這一問題,研究人員探索了多種輕量級且高效的網絡設計。然而,現有的輕量級模型主要依賴自注意力機制和卷積進行令牌混合。這種依賴性在輕量級網絡的感知與聚合過程中帶來了效果和效率上的限制,阻礙了在有限計算預算下性能與效率之間的平衡。本文從高效人類視覺系統中固有的動態異尺度視覺能力中汲取靈感,提出了一種「見大聚焦小」的輕量級視覺網絡設計策略。我們引入了LS(大-小)卷積,它結合了大核感知與小核聚合,能夠高效捕捉廣泛的感知信息並實現對動態複雜視覺表示的精確特徵聚合,從而熟練處理視覺信息。基於LS卷積,我們提出了LSNet,一個新的輕量級模型家族。大量實驗表明,LSNet在多種視覺任務中相較於現有的輕量級網絡,實現了更優的性能與效率。代碼和模型可在https://github.com/jameslahm/lsnet獲取。
English
Vision network designs, including Convolutional Neural Networks and Vision Transformers, have significantly advanced the field of computer vision. Yet, their complex computations pose challenges for practical deployments, particularly in real-time applications. To tackle this issue, researchers have explored various lightweight and efficient network designs. However, existing lightweight models predominantly leverage self-attention mechanisms and convolutions for token mixing. This dependence brings limitations in effectiveness and efficiency in the perception and aggregation processes of lightweight networks, hindering the balance between performance and efficiency under limited computational budgets. In this paper, we draw inspiration from the dynamic heteroscale vision ability inherent in the efficient human vision system and propose a ``See Large, Focus Small'' strategy for lightweight vision network design. We introduce LS (Large-Small) convolution, which combines large-kernel perception and small-kernel aggregation. It can efficiently capture a wide range of perceptual information and achieve precise feature aggregation for dynamic and complex visual representations, thus enabling proficient processing of visual information. Based on LS convolution, we present LSNet, a new family of lightweight models. Extensive experiments demonstrate that LSNet achieves superior performance and efficiency over existing lightweight networks in various vision tasks. Codes and models are available at https://github.com/jameslahm/lsnet.
PDF113April 3, 2025