通过可扩展的问题合成从零开始释放LLM的推理能力
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch
October 24, 2024
作者: Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Qiaoming Zhu, Min Zhang
cs.AI
摘要
高质量数据的可用性是提升LLM推理能力的最重要因素之一。现有研究已经证明了从种子问题或知识库中创建更多指导数据的有效性。最近的研究表明,持续扩大来自强模型(例如GPT-4)的数据合成可以进一步激发推理性能。尽管有所希望,但开源社区仍然缺乏规模化的高质量数据和可负担成本的可扩展数据合成方法。为了解决这个问题,我们引入了ScaleQuest,这是一种可扩展且新颖的数据合成方法,利用“小型”(例如7B)开源模型从头开始生成问题,而无需复杂的增强约束种子数据。通过高效的ScaleQuest,我们自动构建了一个包含100万个问题-解决方案对的数学推理数据集,比现有的开源数据集更有效。它可以普遍提高主流开源模型(例如Mistral、Llama3、DeepSeekMath和Qwen2-Math)的性能,MATH上的增益达到29.2%至46.4%。值得注意的是,仅通过使用我们的数据集微调Qwen2-Math-7B-Base模型,甚至可以超越Qwen2-Math-7B-Instruct,这是一个在闭源数据上表现强大且良好对齐的模型,以及GPT-4-Turbo和Claude-3.5 Sonnet等专有模型。
English
The availability of high-quality data is one of the most important factors in
improving the reasoning capability of LLMs. Existing works have demonstrated
the effectiveness of creating more instruction data from seed questions or
knowledge bases. Recent research indicates that continually scaling up data
synthesis from strong models (e.g., GPT-4) can further elicit reasoning
performance. Though promising, the open-sourced community still lacks
high-quality data at scale and scalable data synthesis methods with affordable
costs. To address this, we introduce ScaleQuest, a scalable and novel data
synthesis method that utilizes "small-size" (e.g., 7B) open-source models to
generate questions from scratch without the need for seed data with complex
augmentation constraints. With the efficient ScaleQuest, we automatically
constructed a mathematical reasoning dataset consisting of 1 million
problem-solution pairs, which are more effective than existing open-sourced
datasets. It can universally increase the performance of mainstream open-source
models (i.e., Mistral, Llama3, DeepSeekMath, and Qwen2-Math) by achieving 29.2%
to 46.4% gains on MATH. Notably, simply fine-tuning the Qwen2-Math-7B-Base
model with our dataset can even surpass Qwen2-Math-7B-Instruct, a strong and
well-aligned model on closed-source data, and proprietary models such as
GPT-4-Turbo and Claude-3.5 Sonnet.Summary
AI-Generated Summary