SciLT：科学图像领域的长尾分类问题

摘要

长尾识别技术得益于基础模型与微调范式的进步，但现有研究和基准测试主要局限于自然图像领域，其预训练与微调数据具有相似的分布特征。相比之下，科学图像呈现出独特的视觉特征和监督信号，这引发了对基础模型在此类场景下微调有效性的质疑。本研究在纯视觉和参数高效微调（PEFT）范式下探索科学领域的长尾识别问题。通过在三个科学基准数据集上的实验发现：微调基础模型带来的性能提升有限，且倒数第二层特征（特别是对尾部类别）具有重要作用。基于这些发现，我们提出SciLT框架，通过自适应特征融合和双重监督学习机制利用多层次表征。该框架通过联合利用倒数第二层和最终层特征，实现了头部与尾部类别的均衡性能。大量实验表明，SciLT持续优于现有方法，为科学长尾识别建立了强大实用的基准，并为基础模型适应具有显著领域偏移的科学数据提供了重要指导。

English

Long-tailed recognition has benefited from foundation models and fine-tuning paradigms, yet existing studies and benchmarks are mainly confined to natural image domains, where pre-training and fine-tuning data share similar distributions. In contrast, scientific images exhibit distinct visual characteristics and supervision signals, raising questions about the effectiveness of fine-tuning foundation models in such settings. In this work, we investigate scientific long-tailed recognition under a purely visual and parameter-efficient fine-tuning (PEFT) paradigm. Experiments on three scientific benchmarks show that fine-tuning foundation models yields limited gains, and reveal that penultimate-layer features play an important role, particularly for tail classes. Motivated by these findings, we propose SciLT, a framework that exploits multi-level representations through adaptive feature fusion and dual-supervision learning. By jointly leveraging penultimate- and final-layer features, SciLT achieves balanced performance across head and tail classes. Extensive experiments demonstrate that SciLT consistently outperforms existing methods, establishing a strong and practical baseline for scientific long-tailed recognition and providing valuable guidance for adapting foundation models to scientific data with substantial domain shifts.

SciLT：科学图像领域的长尾分类问题

SciLT: Long-Tailed Classification in Scientific Image Domains

摘要

Support