세포 유형당 한 번의 클릭이면 충분: 학습 없는 그룹 상호작용을 통한 세포 개체 분할

초록

세포별 데이터셋에서 훈련된 세포 인스턴스 분할 모델은 분포 외 세포 유형에서 심각한 성능 저하를 겪는 반면, 대화형 기반 모델은 인스턴스별 프롬프트를 통해 이 문제를 극복하지만, 수백에서 수천 개의 빽빽하게 채워진 인스턴스를 포함하는 조직병리학 이미지에는 비용이 엄청나게 많이 듭니다. 우리는 대화형 분할을 인스턴스별 O(N)에서 유형별 O(T)로 전환하는 새로운 패러다임인 그룹 프롬프팅(Group Prompting)을 소개합니다. 이는 세포 유형당 한 번의 클릭으로 해당 유형의 모든 인스턴스를 분할하기에 충분합니다. 우리의 핵심 관찰은 Segment Anything Model(SAM)의 고정된 이미지 인코더가 프롬프트가 주어지기 전에 이미 특징 공간에서 동일 유형의 세포를 군집화한다는 것입니다. 이 특성을 활용하여, 우리는 훈련이 필요 없는 프레임워크인 Chain-of-Prompts(CoP)를 제안합니다. 이는 단일 사용자 클릭을 (1) 다중 스케일 인코더 특징의 비모수적 게이팅을 통해 신뢰할 수 있는 동일 유형 위치를 식별하고, (2) 공간적으로 가장 먼 신뢰할 수 있는 지점을 다음 프롬프트로 선택하여 커버리지를 최대화하는 방식으로 재귀적으로 확장합니다. 세 가지 세포 유형 주석 벤치마크에서 유형당 한 번의 클릭으로 CoP는 인스턴스별 성능의 90% 이상을 유지하며, 추가 훈련 없이 완전 지도 방법을 능가합니다. 네 가지 형태적으로 균일한 벤치마크에서는 단일 클릭으로 99% 이상을 유지합니다. 프로젝트 페이지: https://shjo-april.github.io/Chain-of-Prompts/

English

Cell instance segmentation models trained on cell-specific datasets suffer severe performance drops on out-of-distribution cell types, while interactive foundation models overcome this through per-instance prompting at a cost that is prohibitively expensive for histopathology images containing hundreds to thousands of densely packed instances. We introduce Group Prompting, a new paradigm that shifts interactive segmentation from per-instance O(N) to per-type O(T), where a single click per cell type suffices to segment all instances of that type. Our key observation is that the frozen image encoder of the Segment Anything Model (SAM) already clusters same-type cells in its feature space before any prompt is given. Exploiting this property, we propose Chain-of-Prompts (CoP), a training-free framework that recursively expands a single user click by (1) identifying reliable same-type locations through non-parametric gating of multi-scale encoder features, and (2) selecting the most spatially distant reliable point as the next prompt to maximize coverage. On three cell-type-annotated benchmarks, CoP with one click per type retains over 90% of per-instance performance and surpasses fully-supervised methods without any additional training. On four morphologically homogeneous benchmarks, a single click retains over 99%. Project Page: https://shjo-april.github.io/Chain-of-Prompts/