基于强化学习的高效医学虚拟信息提取
Efficient Medical VIE via Reinforcement Learning
June 16, 2025
作者: Lijun Liu, Ruiyang Li, Zhaocheng Liu, Chenglin Zhu, Chong Li, Jiehan Cheng, Qiang Ju, Jian Xie
cs.AI
摘要
视觉信息提取(VIE)将非结构化的文档图像转换为如JSON等结构化格式,这对于医疗应用(如报告分析和在线咨询)至关重要。传统方法依赖于OCR和语言模型,而端到端的多模态模型则直接生成JSON。然而,领域特定的模式和高昂的标注成本限制了它们在医疗VIE中的有效性。我们基于可验证奖励的强化学习(RLVR)框架,仅使用100个标注样本来应对这些挑战。我们的方法确保了数据集的多样性,通过平衡的精确率-召回率奖励机制减少幻觉并提高字段覆盖率,并采用创新的采样策略增强推理能力。通过使用RLVR方法微调Qwen2.5-VL-7B,我们在医疗VIE任务中实现了最先进的性能,显著提升了F1、精确率和召回率。尽管我们的模型在与医疗数据集相似的任务上表现出色,但在不相似的任务上性能下降,凸显了领域特定优化的必要性。案例研究进一步证明了在训练和推理过程中进行推理对VIE的价值。
English
Visual Information Extraction (VIE) converts unstructured document images
into structured formats like JSON, critical for medical applications such as
report analysis and online consultations. Traditional methods rely on OCR and
language models, while end-to-end multimodal models offer direct JSON
generation. However, domain-specific schemas and high annotation costs limit
their effectiveness in medical VIE. We base our approach on the Reinforcement
Learning with Verifiable Rewards (RLVR) framework to address these challenges
using only 100 annotated samples. Our approach ensures dataset diversity, a
balanced precision-recall reward mechanism to reduce hallucinations and improve
field coverage, and innovative sampling strategies to enhance reasoning
capabilities. Fine-tuning Qwen2.5-VL-7B with our RLVR method, we achieve
state-of-the-art performance on medical VIE tasks, significantly improving F1,
precision, and recall. While our models excel on tasks similar to medical
datasets, performance drops on dissimilar tasks, highlighting the need for
domain-specific optimization. Case studies further demonstrate the value of
reasoning during training and inference for VIE.