病理学思维链(Pathology-CoT):从专家全切片图像诊断行为中学习视觉思维链代理
Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior
October 6, 2025
作者: Sheng Wang, Ruiming Wu, Charles Herndon, Yihang Liu, Shunsuke Koga, Jeanne Shen, Zhi Huang
cs.AI
摘要
全切片图像的诊断是一个互动、多阶段的过程,涉及放大倍率的调整和视野间的移动。尽管近期的病理学基础模型表现强劲,但实际应用中仍缺乏能够决定下一步检查哪个区域、调整放大倍率并提供可解释诊断的智能代理系统。阻碍在于数据:专家观察行为是隐性的、基于经验的,并未记录于教科书或网络,因此在大规模语言模型训练中缺失了可扩展且与临床对齐的监督。我们引入了AI会话记录器,它与标准WSI查看器协同工作,无干扰地记录常规导航,并将查看日志转化为标准化的行为指令(在特定放大倍率下检查或窥视)及边界框。通过轻量级的人机交互审查,将AI草拟的推理转化为Pathology-CoT数据集,这是一种成对的“看哪里”和“为何重要”的监督形式,其标注时间大约降低了六倍。利用这些行为数据,我们构建了Pathologist-o3,一个两阶段代理,首先提出感兴趣区域,随后进行行为引导的推理。在胃肠道淋巴结转移检测任务中,它达到了84.5%的精确率、100.0%的召回率和75.4%的准确率,超越了当前最先进的OpenAI o3模型,并在不同骨干网络上展现出良好的泛化能力。据我们所知,这是病理学领域首批基于行为的智能代理系统之一。通过将日常查看日志转化为可扩展、专家验证的监督,我们的框架使智能病理学变得可行,并为构建与人类对齐、可升级的临床AI开辟了道路。
English
Diagnosing a whole-slide image is an interactive, multi-stage process
involving changes in magnification and movement between fields. Although recent
pathology foundation models are strong, practical agentic systems that decide
what field to examine next, adjust magnification, and deliver explainable
diagnoses are still lacking. The blocker is data: scalable, clinically aligned
supervision of expert viewing behavior that is tacit and experience-based, not
written in textbooks or online, and therefore absent from large language model
training. We introduce the AI Session Recorder, which works with standard WSI
viewers to unobtrusively record routine navigation and convert the viewer logs
into standardized behavioral commands (inspect or peek at discrete
magnifications) and bounding boxes. A lightweight human-in-the-loop review
turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired
"where to look" and "why it matters" supervision produced at roughly six times
lower labeling time. Using this behavioral data, we build Pathologist-o3, a
two-stage agent that first proposes regions of interest and then performs
behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection,
it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the
state-of-the-art OpenAI o3 model and generalizing across backbones. To our
knowledge, this constitutes one of the first behavior-grounded agentic systems
in pathology. Turning everyday viewer logs into scalable, expert-validated
supervision, our framework makes agentic pathology practical and establishes a
path to human-aligned, upgradeable clinical AI.