SciLT: 과학 이미지 영역에서의 롱테일 분류

초록

기반 모델과 미세 조정 패러다임 덕분에 롱테일 인식이 크게 발전했지만, 기존 연구와 벤치마크는 주로 사전 학습 데이터와 미세 조인 데이터의 분포가 유사한 자연 이미지 영역에 한정되어 있다. 이와 대조적으로, 과학 이미지는 뚜렷이 다른 시각적 특성과 감독 신호를 보여주어, 이러한 환경에서 기반 모델을 미세 조정하는 방법의 효과성에 의문을 제기한다. 본 연구에서는 순수 시각적 및 매개변수 효율적 미세 조정(PEFT) 패러다임 하에서 과학적 롱테일 인식을 탐구한다. 세 가지 과학 벤치마크에 대한 실험 결과, 기반 모델 미세 조정의 성능 향상 효과는 제한적이며, 특히 꼬리 클래스에서 파이널 레이어 바로 앞 레이어의 특징이 중요한 역할을 한다는 점을 확인했다. 이러한 발견을 바탕으로, 우리는 적응형 특징 융합과 이중 감독 학습을 통해 다중 수준 표현을 활용하는 SciLT 프레임워크를 제안한다. 파이널 레이어 바로 앞 레이어와 최종 레이어의 특징을 공동으로 활용함으로써, SciLT는 헤드 클래스와 꼬리 클래스 모두에서 균형 잡힌 성능을 달성한다. 폭넓은 실험을 통해 SciLT가 기존 방법들을 일관되게 능가하며, 과학적 롱테일 인식을 위한 강력하고 실용적인 기준을 수립하고, 도메인 변화가 큰 과학 데이터에 기반 모델을 적용하는 데 유용한 지침을 제공함을 입증했다.

English

Long-tailed recognition has benefited from foundation models and fine-tuning paradigms, yet existing studies and benchmarks are mainly confined to natural image domains, where pre-training and fine-tuning data share similar distributions. In contrast, scientific images exhibit distinct visual characteristics and supervision signals, raising questions about the effectiveness of fine-tuning foundation models in such settings. In this work, we investigate scientific long-tailed recognition under a purely visual and parameter-efficient fine-tuning (PEFT) paradigm. Experiments on three scientific benchmarks show that fine-tuning foundation models yields limited gains, and reveal that penultimate-layer features play an important role, particularly for tail classes. Motivated by these findings, we propose SciLT, a framework that exploits multi-level representations through adaptive feature fusion and dual-supervision learning. By jointly leveraging penultimate- and final-layer features, SciLT achieves balanced performance across head and tail classes. Extensive experiments demonstrate that SciLT consistently outperforms existing methods, establishing a strong and practical baseline for scientific long-tailed recognition and providing valuable guidance for adapting foundation models to scientific data with substantial domain shifts.

SciLT: 과학 이미지 영역에서의 롱테일 분류

SciLT: Long-Tailed Classification in Scientific Image Domains

초록

Support