EvoSyn:面向可验证学习的通用进化数据合成框架
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning
October 20, 2025
作者: He Du, Bowen Li, Aijun Yang, Siyang He, Qipeng Guo, Dacheng Tao
cs.AI
摘要
可靠且可验证的数据已成为现代语言模型能力提升的关键驱动力,它使得基于可验证奖励的稳定强化学习成为可能,并实现了跨数学、编程及智能体任务的有效知识蒸馏。然而,构建具有普遍适用性的合成可验证数据仍面临挑战,主要源于易产生幻觉的生成过程,以及验证证据的薄弱或琐碎,难以区分优劣解决方案。现有方法多依赖于特定任务的启发式规则或事后过滤机制,这些方法难以跨领域迁移,且缺乏一种原则性、通用的可验证性评估器。本研究中,我们提出了一种进化式、任务无关、策略引导、可执行检查的数据合成框架。该框架从最小化的种子监督出发,协同合成问题、多样化的候选解决方案及验证证据,并通过一致性评估器迭代发现策略,该评估器强制要求人工标注与策略引导的检查结果一致。这一流程将过滤升级为原则性合成:它可靠地组装出连贯、可验证的训练实例,并在无需领域特定规则的情况下实现泛化。我们的实验验证了所提方法在RLVR和模型蒸馏训练范式下的有效性。结果表明,使用我们合成的数据进行训练,在LiveCodeBench和AgentBench-OS任务上均取得了显著提升,凸显了框架的强健泛化能力。
English
Reliable verifiable data has become a key driver of capability gains in
modern language models, enabling stable reinforcement learning with verifiable
rewards and effective distillation that transfers competence across math,
coding, and agentic tasks. Yet constructing generalizable synthetic verifiable
data remains difficult due to hallucination-prone generation, and weak or
trivial verification artifacts that fail to separate strong from weak
solutions. Existing approaches often rely on task-specific heuristics or
post-hoc filters that do not transfer across domains and lack a principled,
universal evaluator of verifiability. In this work, we introduce an
evolutionary, task-agnostic, strategy-guided, executably-checkable data
synthesis framework that, from minimal seed supervision, jointly synthesizes
problems, diverse candidate solutions, and verification artifacts, and
iteratively discovers strategies via a consistency-based evaluator that
enforces agreement between human-annotated and strategy-induced checks. This
pipeline upgrades filtering into principled synthesis: it reliably assembles
coherent, verifiable training instances and generalizes without domain-specific
rules. Our experiments demonstrate the effectiveness of the proposed approach
under both RLVR and model distillation training paradigms. The results show
that training with our synthesized data yields significant improvements on both
the LiveCodeBench and AgentBench-OS tasks, highlighting the robust
generalization of our framework.