病理学思维链（Pathology-CoT）：从专家全切片图像诊断行为中学习视觉思维链代理

摘要

全切片图像的诊断是一个互动、多阶段的过程，涉及放大倍率的调整和视野间的移动。尽管近期的病理学基础模型表现强劲，但实际应用中仍缺乏能够决定下一步检查哪个区域、调整放大倍率并提供可解释诊断的智能代理系统。阻碍在于数据：专家观察行为是隐性的、基于经验的，并未记录于教科书或网络，因此在大规模语言模型训练中缺失了可扩展且与临床对齐的监督。我们引入了AI会话记录器，它与标准WSI查看器协同工作，无干扰地记录常规导航，并将查看日志转化为标准化的行为指令（在特定放大倍率下检查或窥视）及边界框。通过轻量级的人机交互审查，将AI草拟的推理转化为Pathology-CoT数据集，这是一种成对的“看哪里”和“为何重要”的监督形式，其标注时间大约降低了六倍。利用这些行为数据，我们构建了Pathologist-o3，一个两阶段代理，首先提出感兴趣区域，随后进行行为引导的推理。在胃肠道淋巴结转移检测任务中，它达到了84.5%的精确率、100.0%的召回率和75.4%的准确率，超越了当前最先进的OpenAI o3模型，并在不同骨干网络上展现出良好的泛化能力。据我们所知，这是病理学领域首批基于行为的智能代理系统之一。通过将日常查看日志转化为可扩展、专家验证的监督，我们的框架使智能病理学变得可行，并为构建与人类对齐、可升级的临床AI开辟了道路。

English

Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.

病理学思维链（Pathology-CoT）：从专家全切片图像诊断行为中学习视觉思维链代理

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior

摘要

Support