MathBode:大語言模型數學推理的頻域指紋
MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning
September 27, 2025
作者: Charles L. Wang
cs.AI
摘要
本文介紹了MathBode,一種針對大型語言模型(LLMs)數學推理能力的動態診斷方法。與一次性準確率不同,MathBode將每個參數化問題視為一個系統:我們對單一參數進行正弦驅動,並擬合模型輸出與精確解的一階諧波響應。這產生了可解釋的、頻率解析的指標——增益(幅度跟踪)和相位(滯後)——它們構成了Bode風格的指紋。在五種閉式家族(線性求解、比率/飽和、複利、2x2線性系統、相似三角形)中,該診斷揭示了系統性的低通行為和增長的相位滯後,這些是僅靠準確率無法發現的。我們將多個模型與一個符號基線進行比較,該基線用於校準儀器(G約為1,φ約為0)。結果在動態特性上區分了前沿模型與中端模型,提供了一個緊湊、可重現的協議,該協議通過可操作的推理保真度和一致性測量來補充標準基準。我們開源了數據集和代碼,以促進進一步的研究和採用。
English
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning
in large language models (LLMs). Instead of one-shot accuracy, MathBode treats
each parametric problem as a system: we drive a single parameter sinusoidally
and fit first-harmonic responses of model outputs and exact solutions. This
yields interpretable, frequency-resolved metrics -- gain (amplitude tracking)
and phase (lag) -- that form Bode-style fingerprints. Across five closed-form
families (linear solve, ratio/saturation, compound interest, 2x2 linear
systems, similar triangles), the diagnostic surfaces systematic low-pass
behavior and growing phase lag that accuracy alone obscures. We compare several
models against a symbolic baseline that calibrates the instrument (G approx
1, phi approx 0). Results separate frontier from mid-tier models on
dynamics, providing a compact, reproducible protocol that complements standard
benchmarks with actionable measurements of reasoning fidelity and consistency.
We open-source the dataset and code to enable further research and adoption.