Hepato-LLaVA: 전체 슬라이드 이미지에서 간세포 병리 분석을 위한 희소 위상 패킹 어텐션 기반 전문 MLLM

초록

간세포암종 진단은 기가픽셀 전체 슬라이드 영상의 해석에 크게 의존하고 있습니다. 그러나 현재의 계산적 접근법은 고정된 해상도 처리 메커니즘과 비효율적인 특징 집계로 인해 제약을 받으며, 이는 필연적으로 심각한 정보 손실이나 높은 특징 중복성을 초래합니다. 이러한 문제점을 해결하기 위해, 우리는 정밀한 간세포 병리 분석을 위해 설계된 전문 다중 모드 대규모 언어 모델인 Hepato-LLaVA를 제안합니다. 우리는 2D 조직 위상을 명시적으로 모델링하는 새로운 Sparse Topo-Pack Attention 메커니즘을 도입했습니다. 이 메커니즘은 전역 맥락을 보존하면서 지역적 진단 근거를 의미론적 요약 토큰으로 효과적으로 집계합니다. 더 나아가, 다중 스케일 데이터의 부족을 극복하기 위해 우리는 전문 병리학자들이 검증한 33,000개의 계층적 질문-답변 쌍으로 구성된 임상 기반 데이터셋인 HepatoPathoVQA를 제시합니다. 우리의 실험 결과는 Hepato-LLaVA가 간세포암종 진단 및 설명 작업에서 최첨단 성능을 달성하며, 기존 방법들을 크게 능가함을 보여줍니다. 우리의 코드와 구현 세부 사항은 https://pris-cv.github.io/Hepto-LLaVA/에서 확인할 수 있습니다.

English

Hepatocellular Carcinoma diagnosis relies heavily on the interpretation of gigapixel Whole Slide Images. However, current computational approaches are constrained by fixed-resolution processing mechanisms and inefficient feature aggregation, which inevitably lead to either severe information loss or high feature redundancy. To address these challenges, we propose Hepato-LLaVA, a specialized Multi-modal Large Language Model designed for fine-grained hepatocellular pathology analysis. We introduce a novel Sparse Topo-Pack Attention mechanism that explicitly models 2D tissue topology. This mechanism effectively aggregates local diagnostic evidence into semantic summary tokens while preserving global context. Furthermore, to overcome the lack of multi-scale data, we present HepatoPathoVQA, a clinically grounded dataset comprising 33K hierarchically structured question-answer pairs validated by expert pathologists. Our experiments demonstrate that Hepato-LLaVA achieves state-of-the-art performance on HCC diagnosis and captioning tasks, significantly outperforming existing methods. Our code and implementation details are available at https://pris-cv.github.io/Hepto-LLaVA/.

Hepato-LLaVA: 전체 슬라이드 이미지에서 간세포 병리 분석을 위한 희소 위상 패킹 어텐션 기반 전문 MLLM

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

초록

Support