哪些数据属性能够激发数学与代码推理能力？基于影响函数的探究

摘要

大型语言模型（LLMs）在数学和编程领域展现出了卓越的推理能力，这通常得益于对由更强模型生成的思维链（CoTs）进行后训练。然而，现有筛选此类训练数据的策略主要依赖于启发式方法，限制了其泛化能力，并难以捕捉数据中的微妙之处。为解决这些局限，我们利用影响函数系统地归因LLMs在数学和编程上的推理能力至单个训练样本、序列及标记，从而深入理解有效数据的特征。基于影响的推理归因（Infra）揭示了数学与编程任务间非平凡的跨领域效应：高难度数学示例能同时提升数学与编程推理，而低难度编程任务则最有效地促进编程推理。基于这些发现，我们提出了一种简单而有效的数据集重加权策略，通过翻转任务难度，使AIME24准确率从10%翻倍至20%，并将Qwen2.5-7B-Instruct在LiveCodeBench上的准确率从33.8%提升至35.3%。此外，我们的细粒度归因显示，序列级的探索行为增强了数学与编程的推理表现，而标记级的影响模式在数学与编程推理中各有侧重：前者偏好自然语言逻辑连接词，后者则强调结构性语法。

English

Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax.

哪些数据属性能够激发数学与代码推理能力？基于影响函数的探究

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

摘要

Support