哪些数据属性能够激发数学与代码推理能力?基于影响函数的探究
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
May 26, 2025
作者: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng
cs.AI
摘要
大型语言模型(LLMs)在数学和编程领域展现出了卓越的推理能力,这通常得益于对由更强模型生成的思维链(CoTs)进行后训练。然而,现有筛选此类训练数据的策略主要依赖于启发式方法,限制了其泛化能力,并难以捕捉数据中的微妙之处。为解决这些局限,我们利用影响函数系统地归因LLMs在数学和编程上的推理能力至单个训练样本、序列及标记,从而深入理解有效数据的特征。基于影响的推理归因(Infra)揭示了数学与编程任务间非平凡的跨领域效应:高难度数学示例能同时提升数学与编程推理,而低难度编程任务则最有效地促进编程推理。基于这些发现,我们提出了一种简单而有效的数据集重加权策略,通过翻转任务难度,使AIME24准确率从10%翻倍至20%,并将Qwen2.5-7B-Instruct在LiveCodeBench上的准确率从33.8%提升至35.3%。此外,我们的细粒度归因显示,序列级的探索行为增强了数学与编程的推理表现,而标记级的影响模式在数学与编程推理中各有侧重:前者偏好自然语言逻辑连接词,后者则强调结构性语法。
English
Large language models (LLMs) have demonstrated remarkable reasoning
capabilities in math and coding, often bolstered by post-training on the
chain-of-thoughts (CoTs) generated by stronger models. However, existing
strategies for curating such training data predominantly rely on heuristics,
limiting generalizability and failing to capture subtleties underlying in data.
To address these limitations, we leverage influence functions to systematically
attribute LLMs' reasoning ability on math and coding to individual training
examples, sequences, and tokens, enabling deeper insights into effective data
characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers
nontrivial cross-domain effects across math and coding tasks: high-difficulty
math examples improve both math and code reasoning, while low-difficulty code
tasks most effectively benefit code reasoning. Based on these findings, we
introduce a simple yet effective dataset reweighting strategy by flipping task
difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts
LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover,
our fine-grained attribution reveals that the sequence-level exploratory
behaviors enhance reasoning performance in both math and code, and the
token-level influence patterns are distinct for math and code reasoning: the
former prefers natural language logic connectors and the latter emphasizes
structural syntax.Summary
AI-Generated Summary