MathBode：大語言模型數學推理的頻域指紋

摘要

本文介紹了MathBode，一種針對大型語言模型（LLMs）數學推理能力的動態診斷方法。與一次性準確率不同，MathBode將每個參數化問題視為一個系統：我們對單一參數進行正弦驅動，並擬合模型輸出與精確解的一階諧波響應。這產生了可解釋的、頻率解析的指標——增益（幅度跟踪）和相位（滯後）——它們構成了Bode風格的指紋。在五種閉式家族（線性求解、比率/飽和、複利、2x2線性系統、相似三角形）中，該診斷揭示了系統性的低通行為和增長的相位滯後，這些是僅靠準確率無法發現的。我們將多個模型與一個符號基線進行比較，該基線用於校準儀器（G約為1，φ約為0）。結果在動態特性上區分了前沿模型與中端模型，提供了一個緊湊、可重現的協議，該協議通過可操作的推理保真度和一致性測量來補充標準基準。我們開源了數據集和代碼，以促進進一步的研究和採用。

English

This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument (G approx 1, phi approx 0). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.

MathBode：大語言模型數學推理的頻域指紋

MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

摘要

Support