ChatPaper.aiChatPaper

超越选择题:可验证开放问答在稳健视觉语言强化微调中的应用

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

November 21, 2025
作者: Yesheng Liu, Hao Li, Haiyu Xu, Baoqi Pei, Jiahao Wang, Mingxuan Zhao, Jingshu Zheng, Zheqi He, JG Yao, Bowen Qin, Xi Yang, Jiajun Zhang
cs.AI

摘要

多選題問答(MCQA)作為評估和強化微調(RFT)現代多模態語言模型的常用形式,其受限的輸出格式便於實現簡化的確定性自動驗證。然而我們發現,選項可能洩露可被利用的線索,導致準確率指標無法可靠反映真實能力,並在RFT過程中助長顯性或隱性的答案猜測行為。我們提出ReVeL(基於大語言模型的重寫與驗證)框架,通過將多選題改寫為開放式問題,同時盡可能保持答案的可驗證性。該框架根據不同答案類型對問題進行分類,並分別應用差異化的重寫與驗證方案。在應用於RFT時,我們轉換了2萬個MCQA樣本,並採用GRPO方法對Qwen2.5-VL模型進行微調。經ReVeL-OpenQA訓練的模型在多選題基準測試中保持與MCQA相當的準確率,並將開放式問答準確率提升約六個百分點,表明其相較基於MCQA的訓練具有更優的數據效率和更穩健的獎勵信號。用於評估時,ReVeL還揭示出多選題基準測試中最高達20個百分點的分數虛高現象(相對於開放式問答),同時提升評判準確度並降低成本與延遲。我們將公開釋出程式碼與數據集。
English
Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on ReVeL-OpenQA match MCQA accuracy on multiple-choice benchmarks and improve OpenQA accuracy by about six percentage points, indicating better data efficiency and more robust reward signals than MCQA-based training. When used for evaluation, ReVeL also reveals up to 20 percentage points of score inflation in MCQA benchmarks (relative to OpenQA), improves judging accuracy, and reduces both cost and latency. We will release code and data publicly.
PDF113February 7, 2026