생물체 이미지에 대한 자동 이미지 수준 형태 형질 주석

초록

형질 특성은 생물체가 환경과 상호작용하는 방식에 대한 중요한 단서를 제공하는 생물학적 유기체의 물리적 특성입니다. 그러나 이러한 특성 추출은 여전히 전문가 중심의 느린 과정에 머물러 있어 대규모 생태학 연구에서의 활용이 제한되고 있습니다. 주요 병목 현상은 생물학 이미지와 형질 수준 주석을 연결하는 고품질 데이터셋의 부재입니다. 본 연구에서는 기초 모델 특징으로 훈련된 희소 오토인코더가 의미 있는 형태학적 부위에서 일관되게 활성화되는 단의적이고 공간적으로 근거 있는 뉴런을 생성함을 입증합니다. 이 특성을 활용하여 두드러진 영역을 지역화하고 시각-언어 프롬프팅을 통해 해석 가능한 형질 설명을 생성하는 형질 주석 파이프라인을 소개합니다. 이 접근법을 사용하여 BIOSCAN-5M의 19,000개 곤충 이미지에 걸친 80,000개 형질 주석으로 구성된 Bioscan-Traits 데이터셋을 구축했습니다. 인간 평가를 통해 생성된 형태학적 설명의 생물학적 타당성이 확인되었습니다. 주요 설계 선택지를 체계적으로 변화시키고 결과 형질 설명의 품질에 미치는 영향을 측정하는 포괄적 ablation 연구를 통해 설계 민감도를 평가했습니다. 과도하게 비용이 많이 드는 수동 작업 대신 모듈식 파이프라인으로 형질을 주석화함으로써, 우리는 기초 모델에 생물학적으로 의미 있는 지도를 주입하는 확장 가능한 방법을 제시하며, 대규모 형태학 분석을 가능하게 하고 생태학적 관련성과 기계학습 실용성 간의 격차를 해소합니다.

English

Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatially grounded neurons that consistently activate on meaningful morphological parts. Leveraging this property, we introduce a trait annotation pipeline that localizes salient regions and uses vision-language prompting to generate interpretable trait descriptions. Using this approach, we construct Bioscan-Traits, a dataset of 80K trait annotations spanning 19K insect images from BIOSCAN-5M. Human evaluation confirms the biological plausibility of the generated morphological descriptions. We assess design sensitivity through a comprehensive ablation study, systematically varying key design choices and measuring their impact on the quality of the resulting trait descriptions. By annotating traits with a modular pipeline rather than prohibitively expensive manual efforts, we offer a scalable way to inject biologically meaningful supervision into foundation models, enable large-scale morphological analyses, and bridge the gap between ecological relevance and machine-learning practicality.

생물체 이미지에 대한 자동 이미지 수준 형태 형질 주석

Automatic Image-Level Morphological Trait Annotation for Organismal Images

초록

Support