外科医生距离手术世界模型还有多远?关于零样本手术视频生成的试点研究及专家评估
How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment
November 3, 2025
作者: Zhen Chen, Qing Xu, Jinlin Wu, Biao Yang, Yuhao Zhai, Geng Guo, Jing Zhang, Yinlu Ding, Nassir Navab, Jiebo Luo
cs.AI
摘要
视频生成基础模型作为模拟物理世界的潜在世界模型,正展现出卓越能力。然而,这类模型在手术等高风险领域的应用仍存在关键空白——这些领域需要的是深度的专业因果知识,而非通用物理规则。为系统应对这一挑战,我们提出首个专家构建的手术视频生成模型评估基准SurgVeo,以及专用于评估从基础表象到复杂手术策略的四层新框架"手术合理性金字塔"。基于SurgVeo基准,我们让先进Veo-3模型对腹腔镜与神经外科手术片段进行零样本预测任务,并由四位认证外科医师团队依据SPP框架评估生成视频。结果揭示出显著的"合理性鸿沟":Veo-3在视觉感知合理性层面表现卓越,但在SPP更高层级(包括器械操作合理性、环境反馈合理性与手术意图合理性)存在关键缺陷。本研究首次量化证明了外科AI中视觉逼真模拟与因果理解之间的巨大差距。通过SurgVeo和SPP框架的发现,我们为开发能驾驭专业现实医疗领域复杂性的未来模型奠定了关键基础与路线图。
English
Foundation models in video generation are demonstrating remarkable
capabilities as potential world models for simulating the physical world.
However, their application in high-stakes domains like surgery, which demand
deep, specialized causal knowledge rather than general physical rules, remains
a critical unexplored gap. To systematically address this challenge, we present
SurgVeo, the first expert-curated benchmark for video generation model
evaluation in surgery, and the Surgical Plausibility Pyramid (SPP), a novel,
four-tiered framework tailored to assess model outputs from basic appearance to
complex surgical strategy. On the basis of the SurgVeo benchmark, we task the
advanced Veo-3 model with a zero-shot prediction task on surgical clips from
laparoscopic and neurosurgical procedures. A panel of four board-certified
surgeons evaluates the generated videos according to the SPP. Our results
reveal a distinct "plausibility gap": while Veo-3 achieves exceptional Visual
Perceptual Plausibility, it fails critically at higher levels of the SPP,
including Instrument Operation Plausibility, Environment Feedback Plausibility,
and Surgical Intent Plausibility. This work provides the first quantitative
evidence of the chasm between visually convincing mimicry and causal
understanding in surgical AI. Our findings from SurgVeo and the SPP establish a
crucial foundation and roadmap for developing future models capable of
navigating the complexities of specialized, real-world healthcare domains.