Hepato-LLaVA: Een Expert Multimodale Grote Taalmodel met Sparse Topo-Pack Aandacht voor Hepatocellulaire Pathologie Analyse op Whole Slide Images

Samenvatting

De diagnose van hepatocellulair carcinoom is sterk afhankelijk van de interpretatie van gigapixel Whole Slide Images. Huidige computationele methoden worden echter beperkt door verwerkingsmechanismen met vaste resolutie en inefficiënte feature-aggregatie, wat onvermijdelijk leidt tot ernstig informatieverlies of hoge feature-redundantie. Om deze uitdagingen aan te pakken, presenteren wij Hepato-LLaVA, een gespecialiseerd Multi-modale Large Language Model ontworpen voor fijnmazige hepatocellulaire pathologie-analyse. Wij introduceren een nieuw Sparse Topo-Pack Attention-mechanisme dat expliciet de 2D-weefseltopologie modelleert. Dit mechanisme aggregeert lokale diagnostische evidence effectief tot semantische samenvattingstokens, waarbij tegelijkertijd de globale context behouden blijft. Verder presenteren wij, om het gebrek aan multi-schaal data te overwinnen, HepatoPathoVQA: een klinisch onderbouwd dataset bestaande uit 33K hiërarchisch gestructureerde vraag-antwoordparen gevalideerd door expert-pathologen. Onze experimenten tonen aan dat Hepato-LLaVA state-of-the-art prestaties bereikt bij HCC-diagnose en beschrijvingstaken, en daarbij bestaande methoden significant overtreft. Onze code en implementatiedetails zijn beschikbaar op https://pris-cv.github.io/Hepto-LLaVA/.

English

Hepatocellular Carcinoma diagnosis relies heavily on the interpretation of gigapixel Whole Slide Images. However, current computational approaches are constrained by fixed-resolution processing mechanisms and inefficient feature aggregation, which inevitably lead to either severe information loss or high feature redundancy. To address these challenges, we propose Hepato-LLaVA, a specialized Multi-modal Large Language Model designed for fine-grained hepatocellular pathology analysis. We introduce a novel Sparse Topo-Pack Attention mechanism that explicitly models 2D tissue topology. This mechanism effectively aggregates local diagnostic evidence into semantic summary tokens while preserving global context. Furthermore, to overcome the lack of multi-scale data, we present HepatoPathoVQA, a clinically grounded dataset comprising 33K hierarchically structured question-answer pairs validated by expert pathologists. Our experiments demonstrate that Hepato-LLaVA achieves state-of-the-art performance on HCC diagnosis and captioning tasks, significantly outperforming existing methods. Our code and implementation details are available at https://pris-cv.github.io/Hepto-LLaVA/.

Hepato-LLaVA: Een Expert Multimodale Grote Taalmodel met Sparse Topo-Pack Aandacht voor Hepatocellulaire Pathologie Analyse op Whole Slide Images

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

Samenvatting

Support