ChatPaper.aiChatPaper

外科医生距离手术世界模型还有多远?关于零样本手术视频生成的试点研究及专家评估

How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment

November 3, 2025
作者: Zhen Chen, Qing Xu, Jinlin Wu, Biao Yang, Yuhao Zhai, Geng Guo, Jing Zhang, Yinlu Ding, Nassir Navab, Jiebo Luo
cs.AI

摘要

视频生成基础模型作为模拟物理世界的潜在世界模型,正展现出卓越的能力。然而,这些模型在手术等高风险领域的应用仍存在关键空白——这类领域需要的是深度专业化的因果知识而非通用物理规则。为系统应对这一挑战,我们推出首个专家策划的手术视频生成模型评估基准SurgVeo,以及专为评估从基础表象到复杂手术策略的模型输出而设计的四层框架"手术合理性金字塔"。基于SurgVeo基准,我们让先进Veo-3模型对腹腔镜和神经外科手术片段进行零样本预测任务,并由四位认证外科医生组成的专家组根据SPP框架对生成视频进行评估。研究结果揭示出显著的"合理性鸿沟":虽然Veo-3在视觉感知合理性层面表现卓越,但在SPP更高层级(包括器械操作合理性、环境反馈合理性和手术意图合理性)存在严重缺陷。这项研究首次量化证明了外科AI中视觉逼真模仿与因果理解之间的巨大差距。通过SurgVeo和SPP的发现,我们为开发能驾驭专业化现实医疗领域复杂性的未来模型奠定了关键基础并绘制了发展路线图。
English
Foundation models in video generation are demonstrating remarkable capabilities as potential world models for simulating the physical world. However, their application in high-stakes domains like surgery, which demand deep, specialized causal knowledge rather than general physical rules, remains a critical unexplored gap. To systematically address this challenge, we present SurgVeo, the first expert-curated benchmark for video generation model evaluation in surgery, and the Surgical Plausibility Pyramid (SPP), a novel, four-tiered framework tailored to assess model outputs from basic appearance to complex surgical strategy. On the basis of the SurgVeo benchmark, we task the advanced Veo-3 model with a zero-shot prediction task on surgical clips from laparoscopic and neurosurgical procedures. A panel of four board-certified surgeons evaluates the generated videos according to the SPP. Our results reveal a distinct "plausibility gap": while Veo-3 achieves exceptional Visual Perceptual Plausibility, it fails critically at higher levels of the SPP, including Instrument Operation Plausibility, Environment Feedback Plausibility, and Surgical Intent Plausibility. This work provides the first quantitative evidence of the chasm between visually convincing mimicry and causal understanding in surgical AI. Our findings from SurgVeo and the SPP establish a crucial foundation and roadmap for developing future models capable of navigating the complexities of specialized, real-world healthcare domains.
PDF61January 19, 2026