利用半监督学习和视觉Transformer进行细粒度分类的迁移学习
Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers
May 17, 2023
作者: Manuel Lagunas, Brayan Impata, Victor Martinez, Virginia Fernandez, Christos Georgakis, Sofia Braun, Felipe Bertrand
cs.AI
摘要
细粒度分类是一项具有挑战性的任务,涉及识别同一类别内对象之间的细微差异。在数据稀缺的情况下,这项任务尤其具有挑战性。视觉Transformer(ViT)最近已成为图像分类的强大工具,因为它们能够利用自注意力机制学习视觉数据的高度表达性表示。在这项工作中,我们探索了Semi-ViT,这是一种使用半监督学习技术微调的ViT模型,适用于缺乏注释数据的情况。这在电子商务中特别常见,那里的图像readily可用,但标签可能是嘈杂的、不存在的或昂贵的获取。我们的结果表明,即使在有限的注释数据下微调,Semi-ViT也优于传统的卷积神经网络(CNN)和ViT。这些发现表明,Semi-ViT在需要对视觉数据进行精确和细粒度分类的应用中具有重要潜力。
English
Fine-grained classification is a challenging task that involves identifying
subtle differences between objects within the same category. This task is
particularly challenging in scenarios where data is scarce. Visual transformers
(ViT) have recently emerged as a powerful tool for image classification, due to
their ability to learn highly expressive representations of visual data using
self-attention mechanisms. In this work, we explore Semi-ViT, a ViT model fine
tuned using semi-supervised learning techniques, suitable for situations where
we have lack of annotated data. This is particularly common in e-commerce,
where images are readily available but labels are noisy, nonexistent, or
expensive to obtain. Our results demonstrate that Semi-ViT outperforms
traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned
with limited annotated data. These findings indicate that Semi-ViTs hold
significant promise for applications that require precise and fine-grained
classification of visual data.