적게 하여 더 얻는다: 효율적인 DETR을 위한 주의 집중

초록

DETR과 유사한 모델들은 탐지기의 성능을 크게 향상시켰으며, 심지어 기존의 컨볼루션 모델들을 능가하기도 했습니다. 그러나 전통적인 인코더 구조에서는 모든 토큰이 동등하게 처리되어 불필요한 계산 부담을 초래합니다. 최근의 희소화 전략은 정보가 풍부한 토큰의 부분집합을 활용하여 주의 복잡도를 줄이면서도 희소 인코더를 통해 성능을 유지하려고 합니다. 하지만 이러한 방법들은 종종 신뢰할 수 없는 모델 통계에 의존하는 경향이 있습니다. 또한, 단순히 토큰의 수를 줄이는 것은 탐지 성능을 크게 저하시켜 이러한 희소 모델의 적용을 제한합니다. 우리는 계산 효율성과 모델 정확도 사이의 더 나은 균형을 위해 정보가 풍부한 토큰에 주의를 집중하는 Focus-DETR을 제안합니다. 구체적으로, 우리는 다중 스케일 특징 맵에서 객체의 위치 및 범주 의미 정보를 모두 고려하는 토큰 점수 메커니즘을 포함한 이중 주의를 통해 인코더를 재구성합니다. 이를 통해 배경 쿼리를 효율적으로 제거하고, 점수에 기반하여 세밀한 객체 쿼리의 의미적 상호작용을 강화합니다. 동일한 설정 하에서 최신 희소 DETR 탐지기들과 비교했을 때, 우리의 Focus-DETR은 비슷한 복잡도를 유지하면서 COCO에서 50.4AP(+2.2)를 달성합니다. 코드는 https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR와 https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR에서 확인할 수 있습니다.

English

DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR and https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR.

적게 하여 더 얻는다: 효율적인 DETR을 위한 주의 집중

Less is More: Focus Attention for Efficient DETR

초록

Support