利用半监督学习和视觉Transformer进行细粒度分类的迁移学习

摘要

细粒度分类是一项具有挑战性的任务，涉及识别同一类别内对象之间的细微差异。在数据稀缺的情况下，这项任务尤其具有挑战性。视觉Transformer（ViT）最近已成为图像分类的强大工具，因为它们能够利用自注意力机制学习视觉数据的高度表达性表示。在这项工作中，我们探索了Semi-ViT，这是一种使用半监督学习技术微调的ViT模型，适用于缺乏注释数据的情况。这在电子商务中特别常见，那里的图像readily可用，但标签可能是嘈杂的、不存在的或昂贵的获取。我们的结果表明，即使在有限的注释数据下微调，Semi-ViT也优于传统的卷积神经网络（CNN）和ViT。这些发现表明，Semi-ViT在需要对视觉数据进行精确和细粒度分类的应用中具有重要潜力。

English

Fine-grained classification is a challenging task that involves identifying subtle differences between objects within the same category. This task is particularly challenging in scenarios where data is scarce. Visual transformers (ViT) have recently emerged as a powerful tool for image classification, due to their ability to learn highly expressive representations of visual data using self-attention mechanisms. In this work, we explore Semi-ViT, a ViT model fine tuned using semi-supervised learning techniques, suitable for situations where we have lack of annotated data. This is particularly common in e-commerce, where images are readily available but labels are noisy, nonexistent, or expensive to obtain. Our results demonstrate that Semi-ViT outperforms traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned with limited annotated data. These findings indicate that Semi-ViTs hold significant promise for applications that require precise and fine-grained classification of visual data.

利用半监督学习和视觉Transformer进行细粒度分类的迁移学习

Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers

摘要

Support