ChatPaper.aiChatPaper

MedGen:透過細粒度註解醫療影片的擴展,開啟醫療影片生成新紀元

MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

July 8, 2025
作者: Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, Benyou Wang
cs.AI

摘要

近期在開放領域的視頻生成技術已取得顯著進展,然而醫學視頻生成領域仍大多未被深入探索。醫學視頻在臨床培訓、教育和模擬等應用中至關重要,不僅要求高視覺逼真度,還需嚴格的醫學準確性。然而,現有模型在應用於醫學提示時,常生成不真實或錯誤的內容,這主要歸因於缺乏針對醫學領域的大規模、高質量數據集。為填補這一空白,我們推出了MedVideoCap-55K,這是首個大規模、多樣化且富含字幕的醫學視頻生成數據集。它包含超過55,000個精心挑選的片段,涵蓋現實世界的醫學場景,為訓練通用醫學視頻生成模型提供了堅實基礎。基於此數據集,我們開發了MedGen,其在開源模型中表現領先,並在多個基準測試中與商業系統在視覺質量和醫學準確性上不相上下。我們希望我們的數據集和模型能成為寶貴資源,並助力推動醫學視頻生成領域的進一步研究。我們的代碼和數據可在https://github.com/FreedomIntelligence/MedGen獲取。
English
Recent advances in video generation have shown remarkable progress in open-domain settings, yet medical video generation remains largely underexplored. Medical videos are critical for applications such as clinical training, education, and simulation, requiring not only high visual fidelity but also strict medical accuracy. However, current models often produce unrealistic or erroneous content when applied to medical prompts, largely due to the lack of large-scale, high-quality datasets tailored to the medical domain. To address this gap, we introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset for medical video generation. It comprises over 55,000 curated clips spanning real-world medical scenarios, providing a strong foundation for training generalist medical video generation models. Built upon this dataset, we develop MedGen, which achieves leading performance among open-source models and rivals commercial systems across multiple benchmarks in both visual quality and medical accuracy. We hope our dataset and model can serve as a valuable resource and help catalyze further research in medical video generation. Our code and data is available at https://github.com/FreedomIntelligence/MedGen
PDF251July 9, 2025