リモートセンシング画像における頑健なクラウドセグメンテーションのためのビジョン基盤モデルの適応

要旨

クラウドセグメンテーションはリモートセンシング画像の解釈において重要な課題であり、その精度は後続のデータ処理と分析の効果に直接影響します。最近、ビジョン基盤モデル（VFM）はさまざまな視覚タスクで強力な汎化能力を示しています。本論文では、精度と頑健性を向上させるために設計されたパラメータ効率の高い適応アプローチであるCloud-Adapterを提案します。当該手法は、一般ドメインデータで事前学習されたVFMを活用し、追加のトレーニングを必要とせず凍結された状態を維持します。Cloud-Adapterには、最初に畳み込みニューラルネットワーク（ConvNet）を使用して密な空間表現を抽出する軽量な空間知覚モジュールが組み込まれています。これらの多スケールの特徴は、適応モジュールへの文脈入力として集約され、VFM内の凍結トランスフォーマーレイヤーを調整します。実験結果は、凍結バックボーンの訓練可能パラメータのわずか0.6%のみを使用するCloud-Adapterアプローチが、実質的な性能向上を達成することを示しています。Cloud-Adapterは、複数の衛星ソース、センサーシリーズ、データ処理レベル、土地被覆シナリオ、注釈の粒度にわたるさまざまなクラウドセグメンテーションデータセットで常に最先端のパフォーマンスを達成します。さらなる研究を支援するために、ソースコードと事前学習モデルをhttps://github.com/XavierJiezou/Cloud-Adapter で公開しています。

English

Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed to enhance the accuracy and robustness of cloud segmentation. Our method leverages a VFM pretrained on general domain data, which remains frozen, eliminating the need for additional training. Cloud-Adapter incorporates a lightweight spatial perception module that initially utilizes a convolutional neural network (ConvNet) to extract dense spatial representations. These multi-scale features are then aggregated and serve as contextual inputs to an adapting module, which modulates the frozen transformer layers within the VFM. Experimental results demonstrate that the Cloud-Adapter approach, utilizing only 0.6% of the trainable parameters of the frozen backbone, achieves substantial performance gains. Cloud-Adapter consistently attains state-of-the-art (SOTA) performance across a wide variety of cloud segmentation datasets from multiple satellite sources, sensor series, data processing levels, land cover scenarios, and annotation granularities. We have released the source code and pretrained models at https://github.com/XavierJiezou/Cloud-Adapter to support further research.

リモートセンシング画像における頑健なクラウドセグメンテーションのためのビジョン基盤モデルの適応

Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images

要旨

Support