病理學思維鏈：從專家全片圖像診斷行為中學習視覺思維鏈代理

摘要

診斷全切片影像是一個互動的多階段過程，涉及放大倍率的變化和視野間的移動。儘管近期的病理學基礎模型表現強勁，但實際能決定下一步檢查哪個視野、調整放大倍率並提供可解釋診斷的代理系統仍屬稀缺。阻礙在於數據：專家觀察行為的可擴展、臨床對齊的監督是隱性且基於經驗的，並未記載於教科書或線上，因此在大語言模型訓練中缺失。我們引入了AI會話記錄器，它與標準的WSI查看器協作，無干擾地記錄常規導航，並將查看器日誌轉化為標準化的行為指令（在特定放大倍率下檢查或窺視）和邊界框。一個輕量級的人類參與審查環節將AI草擬的推理轉化為Pathology-CoT數據集，這是一種配對的「看哪裡」和「為何重要」的監督形式，其標註時間約降低六倍。利用這些行為數據，我們構建了Pathologist-o3，一個兩階段代理系統，首先提出感興趣區域，然後進行行為引導的推理。在胃腸道淋巴結轉移檢測任務中，它達到了84.5%的精度、100.0%的召回率和75.4%的準確率，超越了最先進的OpenAI o3模型，並在不同骨幹網絡上展現了良好的泛化能力。據我們所知，這是病理學領域首批基於行為的代理系統之一。通過將日常查看器日誌轉化為可擴展、專家驗證的監督，我們的框架使代理病理學變得實用，並為構建與人類對齊、可升級的臨床AI鋪平了道路。

English

Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.

病理學思維鏈：從專家全片圖像診斷行為中學習視覺思維鏈代理

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

摘要

Support