MathBode: LLM 수학적 추론의 주파수 영역 지문

초록

본 논문은 대규모 언어 모델(LLMs)의 수학적 추론 능력을 진단하기 위한 동적 진단 도구인 MathBode를 소개합니다. MathBode는 단일 시도 정확도 대신 각 매개변수 문제를 시스템으로 취급합니다: 단일 매개변수를 정현파적으로 변화시키면서 모델 출력과 정확한 해의 1차 고조파 응답을 피팅합니다. 이를 통해 해석 가능한 주파수 분해 메트릭인 이득(진폭 추적)과 위상(지연)을 얻으며, 이는 Bode 스타일의 지문을 형성합니다. 다섯 가지 폐쇄형 문제군(선형 해결, 비율/포화, 복리, 2x2 선형 시스템, 닮은 삼각형)에 걸쳐 이 진단 도구는 체계적인 저역 통과 특성과 증가하는 위상 지연을 드러내며, 이는 정확도만으로는 파악하기 어려운 현상입니다. 여러 모델을 기기 보정을 위한 기호적 기준선(G ≈ 1, φ ≈ 0)과 비교합니다. 결과는 동적 특성 측면에서 최첨단 모델과 중간 수준 모델을 구분하며, 추론의 충실도와 일관성에 대한 실행 가능한 측정치를 제공함으로써 표준 벤치마크를 보완하는 간결하고 재현 가능한 프로토콜을 제시합니다. 추가 연구와 도입을 위해 데이터셋과 코드를 공개합니다.

English

This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument (G approx 1, phi approx 0). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.

MathBode: LLM 수학적 추론의 주파수 영역 지문

MathBode: Frequency-Domain Fingerprints of LLM Mathematical Reasoning

초록

Support