ChatPaper.aiChatPaper

MathBode:大語言模型數學推理的頻域指紋

MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

September 27, 2025
作者: Charles L. Wang
cs.AI

摘要

本文介紹了MathBode,一種針對大型語言模型(LLMs)數學推理能力的動態診斷方法。與一次性準確率不同,MathBode將每個參數化問題視為一個系統:我們對單一參數進行正弦驅動,並擬合模型輸出與精確解的一階諧波響應。這產生了可解釋的、頻率解析的指標——增益(幅度跟踪)和相位(滯後)——它們構成了Bode風格的指紋。在五種閉式家族(線性求解、比率/飽和、複利、2x2線性系統、相似三角形)中,該診斷揭示了系統性的低通行為和增長的相位滯後,這些是僅靠準確率無法發現的。我們將多個模型與一個符號基線進行比較,該基線用於校準儀器(G約為1,φ約為0)。結果在動態特性上區分了前沿模型與中端模型,提供了一個緊湊、可重現的協議,該協議通過可操作的推理保真度和一致性測量來補充標準基準。我們開源了數據集和代碼,以促進進一步的研究和採用。
English
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument (G approx 1, phi approx 0). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
PDF32September 30, 2025