EvoSyn:可驗證學習的通用化進化數據合成
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning
October 20, 2025
作者: He Du, Bowen Li, Aijun Yang, Siyang He, Qipeng Guo, Dacheng Tao
cs.AI
摘要
可靠且可驗證的數據已成為現代語言模型能力提升的關鍵驅動力,它促進了基於可驗證獎勵的穩定強化學習,並實現了在數學、編程及代理任務間有效的能力遷移。然而,構建具有普遍性的合成可驗證數據仍面臨挑戰,原因在於生成過程易產生幻覺,以及驗證手段薄弱或瑣碎,難以區分優劣解決方案。現有方法多依賴於特定任務的啟發式規則或事後過濾,這些方法無法跨領域遷移,且缺乏一個原則性、普適性的可驗證性評估器。本研究中,我們提出了一種進化式、任務無關、策略指導、可執行檢查的數據合成框架。該框架從最小種子監督出發,同步合成問題、多樣候選解決方案及驗證依據,並通過一致性評估器迭代發現策略,該評估器強制要求人工標註與策略引導的檢查結果一致。此流程將過濾升級為原則性合成:它可靠地組裝出連貫、可驗證的訓練實例,並無需特定領域規則即可實現泛化。我們的實驗證明了所提方法在RLVR和模型蒸餾訓練範式下的有效性。結果顯示,使用我們合成的數據進行訓練,在LiveCodeBench和AgentBench-OS任務上均取得了顯著提升,凸顯了我們框架的強健泛化能力。
English
Reliable verifiable data has become a key driver of capability gains in
modern language models, enabling stable reinforcement learning with verifiable
rewards and effective distillation that transfers competence across math,
coding, and agentic tasks. Yet constructing generalizable synthetic verifiable
data remains difficult due to hallucination-prone generation, and weak or
trivial verification artifacts that fail to separate strong from weak
solutions. Existing approaches often rely on task-specific heuristics or
post-hoc filters that do not transfer across domains and lack a principled,
universal evaluator of verifiability. In this work, we introduce an
evolutionary, task-agnostic, strategy-guided, executably-checkable data
synthesis framework that, from minimal seed supervision, jointly synthesizes
problems, diverse candidate solutions, and verification artifacts, and
iteratively discovers strategies via a consistency-based evaluator that
enforces agreement between human-annotated and strategy-induced checks. This
pipeline upgrades filtering into principled synthesis: it reliably assembles
coherent, verifiable training instances and generalizes without domain-specific
rules. Our experiments demonstrate the effectiveness of the proposed approach
under both RLVR and model distillation training paradigms. The results show
that training with our synthesized data yields significant improvements on both
the LiveCodeBench and AgentBench-OS tasks, highlighting the robust
generalization of our framework.