アフリカ野生動物画像分類における深層学習モデルの評価：DenseNetからVision Transformersまで

要旨

アフリカの野生生物個体群は深刻な脅威に直面しており、過去50年間で脊椎動物の数は65％以上減少している。これに対応して、深層学習を用いた画像分類が生物多様性のモニタリングと保全のための有望なツールとして登場している。本論文では、アフリカの野生生物画像を自動分類するための深層学習モデルの比較研究を提示し、凍結された特徴抽出器を用いた転移学習に焦点を当てる。バッファロー、ゾウ、サイ、シマウマの4種を含む公開データセットを使用し、DenseNet-201、ResNet-152、EfficientNet-B4、およびVision Transformer ViT-H/14の性能を評価した。DenseNet-201は畳み込みネットワークの中で最高の性能（67％の精度）を達成し、ViT-H/14は全体で最高の精度（99％）を達成したが、計算コストが大幅に高く、実用上の懸念を引き起こした。我々の実験は、精度、リソース要件、および実用性の間のトレードオフを明らかにしている。最高性能のCNN（DenseNet-201）は、Hugging Face Gradio Spaceに統合され、保全現場でのリアルタイム使用の実現可能性を示した。本研究は、モデル選択、データセットの準備、および野生生物保全のための深層学習ツールの責任ある展開に関する実践的な洞察を提供することで、アフリカに根ざしたAI研究に貢献する。

English

Wildlife populations in Africa face severe threats, with vertebrate numbers declining by over 65% in the past five decades. In response, image classification using deep learning has emerged as a promising tool for biodiversity monitoring and conservation. This paper presents a comparative study of deep learning models for automatically classifying African wildlife images, focusing on transfer learning with frozen feature extractors. Using a public dataset of four species: buffalo, elephant, rhinoceros, and zebra; we evaluate the performance of DenseNet-201, ResNet-152, EfficientNet-B4, and Vision Transformer ViT-H/14. DenseNet-201 achieved the best performance among convolutional networks (67% accuracy), while ViT-H/14 achieved the highest overall accuracy (99%), but with significantly higher computational cost, raising deployment concerns. Our experiments highlight the trade-offs between accuracy, resource requirements, and deployability. The best-performing CNN (DenseNet-201) was integrated into a Hugging Face Gradio Space for real-time field use, demonstrating the feasibility of deploying lightweight models in conservation settings. This work contributes to African-grounded AI research by offering practical insights into model selection, dataset preparation, and responsible deployment of deep learning tools for wildlife conservation.

アフリカ野生動物画像分類における深層学習モデルの評価：DenseNetからVision Transformersまで

Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers

要旨

Support