MedGen:通过细粒度标注医疗视频实现规模化医疗视频生成
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
July 8, 2025
作者: Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, Benyou Wang
cs.AI
摘要
近期,视频生成技术在开放领域取得了显著进展,然而医学视频生成仍处于探索不足的状态。医学视频在临床培训、教育和模拟等应用中至关重要,不仅要求高视觉保真度,还需严格的医学准确性。然而,现有模型在处理医学提示时,常生成不真实或错误的内容,这主要归因于缺乏针对医学领域的大规模、高质量数据集。为填补这一空白,我们推出了MedVideoCap-55K,这是首个大规模、多样化且富含字幕的医学视频生成数据集。该数据集包含超过55,000条精选片段,覆盖真实世界的医疗场景,为训练通用医学视频生成模型奠定了坚实基础。基于此数据集,我们开发了MedGen,其在开源模型中表现领先,并在多个基准测试中与商业系统在视觉质量和医学准确性上不相上下。我们期望我们的数据集和模型能成为宝贵资源,推动医学视频生成领域的进一步研究。我们的代码和数据可在https://github.com/FreedomIntelligence/MedGen获取。
English
Recent advances in video generation have shown remarkable progress in
open-domain settings, yet medical video generation remains largely
underexplored. Medical videos are critical for applications such as clinical
training, education, and simulation, requiring not only high visual fidelity
but also strict medical accuracy. However, current models often produce
unrealistic or erroneous content when applied to medical prompts, largely due
to the lack of large-scale, high-quality datasets tailored to the medical
domain. To address this gap, we introduce MedVideoCap-55K, the first
large-scale, diverse, and caption-rich dataset for medical video generation. It
comprises over 55,000 curated clips spanning real-world medical scenarios,
providing a strong foundation for training generalist medical video generation
models. Built upon this dataset, we develop MedGen, which achieves leading
performance among open-source models and rivals commercial systems across
multiple benchmarks in both visual quality and medical accuracy. We hope our
dataset and model can serve as a valuable resource and help catalyze further
research in medical video generation. Our code and data is available at
https://github.com/FreedomIntelligence/MedGen