哪些數據屬性能激發數學與編碼推理?基於影響函數的探究
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions
May 26, 2025
作者: Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng
cs.AI
摘要
大型語言模型(LLMs)在數學和編程領域展現了卓越的推理能力,這通常得益於對由更強模型生成的思維鏈(CoTs)進行後續訓練。然而,現有的訓練數據策劃策略主要依賴於啟發式方法,限制了其泛化能力,並未能捕捉數據中的細微差異。為解決這些限制,我們利用影響函數系統地將LLMs在數學和編程上的推理能力歸因於個別訓練樣本、序列和詞元,從而更深入地理解有效數據的特徵。我們的基於影響的推理歸因(Infra)揭示了數學和編程任務之間非平凡的跨領域效應:高難度的數學樣本提升了數學和編程推理能力,而低難度的編程任務最有效地提升了編程推理能力。基於這些發現,我們引入了一種簡單而有效的數據集重新加權策略,通過翻轉任務難度,將AIME24的準確率從10%提升至20%,並將Qwen2.5-7B-Instruct在LiveCodeBench上的準確率從33.8%提升至35.3%。此外,我們的細粒度歸因揭示了序列級別的探索行為增強了數學和編程的推理性能,而詞元級別的影響模式在數學和編程推理中有所不同:前者偏好自然語言邏輯連接詞,後者則強調結構化語法。
English
Large language models (LLMs) have demonstrated remarkable reasoning
capabilities in math and coding, often bolstered by post-training on the
chain-of-thoughts (CoTs) generated by stronger models. However, existing
strategies for curating such training data predominantly rely on heuristics,
limiting generalizability and failing to capture subtleties underlying in data.
To address these limitations, we leverage influence functions to systematically
attribute LLMs' reasoning ability on math and coding to individual training
examples, sequences, and tokens, enabling deeper insights into effective data
characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers
nontrivial cross-domain effects across math and coding tasks: high-difficulty
math examples improve both math and code reasoning, while low-difficulty code
tasks most effectively benefit code reasoning. Based on these findings, we
introduce a simple yet effective dataset reweighting strategy by flipping task
difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts
LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover,
our fine-grained attribution reveals that the sequence-level exploratory
behaviors enhance reasoning performance in both math and code, and the
token-level influence patterns are distinct for math and code reasoning: the
former prefers natural language logic connectors and the latter emphasizes
structural syntax.Summary
AI-Generated Summary