基於細粒度分類的漁業電子監測魚類視覺重識別技術研究
Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries
December 9, 2025
作者: Samitha Nuwan Thilakarathna, Ercan Avsar, Martin Mathias Nielsen, Malte Pedersen
cs.AI
摘要
精確的漁業資料對於實現有效且可持續的海洋資源管理至關重要。隨著電子監控系統的近期推廣,目前收集的影像資料量已超出人工審核的負荷能力。本文針對此挑戰,利用新型AutoFish數據集開發了優化的深度學習流水線,用於實現魚類自動重識別。該數據集模擬配備傳送帶的電子監控系統,包含六種外觀相似的魚種。我們證實,通過結合困難三元組挖掘技術與自訂影像轉換流程(包含數據集專用標準化處理),能顯著提升重識別關鍵指標(R1與mAP@k)。採用這些策略後,基於視覺Transformer的Swin-T架構持續優於卷積神經網路ResNet-50,達到41.65%的mAP@k峰值與90.43%的Rank-1準確率。深入分析表明,主要挑戰在於區分同物種中視覺特徵相似的個體(物種內誤差),其中視角不一致產生的負面影響遠大於部分遮擋。源代碼與文檔已公開於:https://github.com/msamdk/Fish_Re_Identification.git
English
Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving peak performance of 41.65% mAP@k and 90.43% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species (Intra-species errors), where viewpoint inconsistency proves significantly more detrimental than partial occlusion. The source code and documentation are available at: https://github.com/msamdk/Fish_Re_Identification.git