ChatPaper.aiChatPaper

超越多项选择:可验证开放问答助力鲁棒性视觉语言强化微调

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

November 21, 2025
作者: Yesheng Liu, Hao Li, Haiyu Xu, Baoqi Pei, Jiahao Wang, Mingxuan Zhao, Jingshu Zheng, Zheqi He, JG Yao, Bowen Qin, Xi Yang, Jiajun Zhang
cs.AI

摘要

多项选择题(MCQA)作为评估和强化微调(RFT)现代多模态语言模型的常用形式,其受限的输出格式支持简化的确定性自动验证。然而,我们发现选项可能泄露可被利用的线索,导致准确率指标无法可靠反映真实能力,并助长RFT过程中的显性或隐性猜答行为。为此,我们提出ReVeL(基于大语言模型的重写与验证)框架,将多选题改写为开放式问题,同时尽可能保持答案的可验证性。该框架根据不同答案类型对问题进行分类,并分别应用差异化的重写与验证方案。在RFT应用场景中,我们转换了2万个MCQA样本,并采用GRPO方法对Qwen2.5-VL模型进行微调。实验表明:基于ReVeL-OpenQA训练的模型在多选题基准测试中保持原有准确率,并将开放式问答准确率提升约6个百分点,这证明其相比基于MCQA的训练具有更优的数据效率和更稳健的奖励信号。用于评估时,ReVeL还揭示了MCQA基准测试中高达20个百分点的分数虚高(相对于开放式问答),同时提升评判准确率并降低成本和延迟。我们将公开代码与数据。
English
Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on ReVeL-OpenQA match MCQA accuracy on multiple-choice benchmarks and improve OpenQA accuracy by about six percentage points, indicating better data efficiency and more robust reward signals than MCQA-based training. When used for evaluation, ReVeL also reveals up to 20 percentage points of score inflation in MCQA benchmarks (relative to OpenQA), improves judging accuracy, and reduces both cost and latency. We will release code and data publicly.
PDF113February 7, 2026