語言模型數學推理的舍恩菲爾德解剖學
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
December 23, 2025
作者: Ming Li, Chenrui Fan, Yize Cheng, Soheil Feizi, Tianyi Zhou
cs.AI
摘要
大型語言模型日益展現出推理軌跡,但其底層認知結構與步驟仍難以超越表面統計數據進行識別與分析。我們採用舍恩菲爾德的片段理論作為歸納性中觀尺度視角,提出ThinkARM(模型推理解剖)框架,該框架可將推理軌跡顯式抽象為功能性推理步驟,如分析、探索、實施、驗證等。在應用於多樣化模型的數學問題求解時,這種抽象方法揭示了可重現的思維動態,以及推理模型與非推理模型之間的結構性差異,這些差異在詞元層面視角下並不明顯。我們進一步提出兩項診斷性案例研究:其一顯示探索功能作為關鍵分支步驟與解題正確性相關,其二表明效率導向方法會選擇性抑制評估反饋步驟而非均勻縮短回應。綜合而言,我們的研究結果證明片段層級表徵能使推理步驟顯性化,從而系統性分析現代語言模型中推理的結構化、穩定化與變異機制。
English
Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models), a scalable framework that explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, Verify, etc. When applied to mathematical problem solving by diverse models, this abstraction reveals reproducible thinking dynamics and structural differences between reasoning and non-reasoning models, which are not apparent from token-level views. We further present two diagnostic case studies showing that exploration functions as a critical branching step associated with correctness, and that efficiency-oriented methods selectively suppress evaluative feedback steps rather than uniformly shortening responses. Together, our results demonstrate that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.