哪些數據屬性能激發數學與編碼推理？基於影響函數的探究

摘要

大型語言模型（LLMs）在數學和編程領域展現了卓越的推理能力，這通常得益於對由更強模型生成的思維鏈（CoTs）進行後續訓練。然而，現有的訓練數據策劃策略主要依賴於啟發式方法，限制了其泛化能力，並未能捕捉數據中的細微差異。為解決這些限制，我們利用影響函數系統地將LLMs在數學和編程上的推理能力歸因於個別訓練樣本、序列和詞元，從而更深入地理解有效數據的特徵。我們的基於影響的推理歸因（Infra）揭示了數學和編程任務之間非平凡的跨領域效應：高難度的數學樣本提升了數學和編程推理能力，而低難度的編程任務最有效地提升了編程推理能力。基於這些發現，我們引入了一種簡單而有效的數據集重新加權策略，通過翻轉任務難度，將AIME24的準確率從10%提升至20%，並將Qwen2.5-7B-Instruct在LiveCodeBench上的準確率從33.8%提升至35.3%。此外，我們的細粒度歸因揭示了序列級別的探索行為增強了數學和編程的推理性能，而詞元級別的影響模式在數學和編程推理中有所不同：前者偏好自然語言邏輯連接詞，後者則強調結構化語法。

English

Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax.

哪些數據屬性能激發數學與編碼推理？基於影響函數的探究

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

摘要

Support