利用半監督學習和視覺轉換器進行細粒度分類的遷移學習
Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers
May 17, 2023
作者: Manuel Lagunas, Brayan Impata, Victor Martinez, Virginia Fernandez, Christos Georgakis, Sofia Braun, Felipe Bertrand
cs.AI
摘要
細粒度分類是一項具有挑戰性的任務,涉及識別同一類別內物件之間的細微差異。這項任務在數據稀缺的情況下尤為困難。視覺Transformer(ViT)最近已成為圖像分類的強大工具,因其能夠利用自注意機制學習視覺數據的高度表達性表示。在這項工作中,我們探索了Semi-ViT,這是一種使用半監督學習技術微調的ViT模型,適用於缺乏標註數據的情況。這在電子商務中特別常見,其中圖像容易獲得,但標籤可能是嘈雜的、不存在的或難以獲取的。我們的結果表明,即使在有限的標註數據下進行微調,Semi-ViT也優於傳統的卷積神經網絡(CNN)和ViTs。這些發現表明,Semi-ViT對於需要對視覺數據進行精確和細粒度分類的應用具有重要潛力。
English
Fine-grained classification is a challenging task that involves identifying
subtle differences between objects within the same category. This task is
particularly challenging in scenarios where data is scarce. Visual transformers
(ViT) have recently emerged as a powerful tool for image classification, due to
their ability to learn highly expressive representations of visual data using
self-attention mechanisms. In this work, we explore Semi-ViT, a ViT model fine
tuned using semi-supervised learning techniques, suitable for situations where
we have lack of annotated data. This is particularly common in e-commerce,
where images are readily available but labels are noisy, nonexistent, or
expensive to obtain. Our results demonstrate that Semi-ViT outperforms
traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned
with limited annotated data. These findings indicate that Semi-ViTs hold
significant promise for applications that require precise and fine-grained
classification of visual data.