Pathology-CoT: 전문가의 전체 슬라이드 이미지 진단 행동에서 시각적 사고 연쇄 에이전트 학습

초록

전체 슬라이드 이미지 진단은 배율 변경과 시야 간 이동을 포함한 상호작용적, 다단계 과정입니다. 최근의 병리학 기반 모델들은 강력하지만, 다음에 어떤 시야를 검사할지 결정하고 배율을 조정하며 설명 가능한 진단을 제공하는 실용적인 에이전트 시스템은 여전히 부족합니다. 이 문제의 핵심은 데이터입니다: 전문가의 경험과 암묵적 지식에 기반한, 교과서나 온라인에 기록되지 않은 임상적으로 정렬된 전문가의 시각 행동에 대한 확장 가능한 감독이 대규모 언어 모델 훈련에서 누락되어 있습니다. 우리는 AI 세션 레코더를 소개합니다. 이 도구는 표준 WSI 뷰어와 함께 작동하여 일상적인 탐색을 방해 없이 기록하고, 뷰어 로그를 표준화된 행동 명령(이산 배율에서 검사하거나 살펴보기)과 경계 상자로 변환합니다. 경량의 인간 참여 검토를 통해 AI가 작성한 근거를 병리학-CoT 데이터셋으로 변환하며, 이는 "어디를 볼 것인가"와 "왜 중요한가"를 짝지은 감독 데이터로, 기존 라벨링 시간의 약 1/6로 생성됩니다. 이 행동 데이터를 사용하여, 우리는 Pathologist-o3를 구축했습니다. 이는 두 단계의 에이전트로, 먼저 관심 영역을 제안한 다음 행동 기반 추론을 수행합니다. 위장관 림프절 전이 검출에서 84.5%의 정밀도, 100.0%의 재현율, 75.4%의 정확도를 달성하여 최첨단 OpenAI o3 모델을 능가하고 다양한 백본에서 일반화되었습니다. 우리가 아는 한, 이는 병리학 분야에서 최초의 행동 기반 에이전트 시스템 중 하나입니다. 일상적인 뷰어 로그를 확장 가능하고 전문가 검증된 감독으로 전환함으로써, 우리의 프레임워크는 에이전트 병리학을 실용적으로 만들고 인간과 정렬된, 업그레이드 가능한 임상 AI로의 길을 열었습니다.

English

Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.

Pathology-CoT: 전문가의 전체 슬라이드 이미지 진단 행동에서 시각적 사고 연쇄 에이전트 학습

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

초록

Support