MathBode：大语言模型数学推理的频域特征指纹

摘要

本文介绍了MathBode，一种用于诊断大型语言模型（LLMs）数学推理能力的动态方法。不同于一次性准确率评估，MathBode将每个参数化问题视为一个系统：我们通过正弦方式驱动单一参数，并拟合模型输出与精确解的一阶谐波响应。由此得到可解释的频率解析指标——增益（幅度跟踪）和相位（滞后），它们构成了Bode图式的特征指纹。在五个闭式问题族（线性求解、比率/饱和、复利、2x2线性系统、相似三角形）中，该诊断揭示了系统性的低通行为及逐渐增大的相位滞后，这些现象是单纯依赖准确率评估所无法察觉的。我们将多个模型与一个符号基准进行比较，后者用于校准仪器（增益G≈1，相位φ≈0）。结果显示，前沿模型与中端模型在动态特性上存在显著差异，提供了一个简洁、可复现的测试协议，该协议通过可操作的推理保真度与一致性测量，对标准基准测试形成了有力补充。我们开源了数据集与代码，以促进进一步的研究与应用。

English

This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument (G approx 1, phi approx 0). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.

MathBode：大语言模型数学推理的频域特征指纹

MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

摘要

Support