덧셈의 형태: 대규모 언어 모델에서 산술의 기하학적 구조

초록

대규모 언어 모델은 기본 산술에서 역설적인 취약성을 보이며, 이는 내부 계산과 이산 출력 사이의 단절을 암시한다. 다중 피연산자 덧셈 동안의 잔차 흐름 기하 구조를 분석함으로써, 우리는 등가 원시 합 궤적(IRST)을 식별한다. 이는 표현이 의미적 숫자에 의해 고정되고 연속 올림 섬유에 의해 조절되는 기하학적 구조이다. 우리는 이 기하 구조를 설명하기 위해 잡음 양자화 모델을 제안하며, 산술 오류를 내부 신경 잡음이 연속적이고 잠재적인 올림 가능성을 양자화 임계값을 넘어 밀어붙임으로써 발생하는 기하학적 미끄러짐으로 규정한다. 이 기하학적 프레임워크는 또한 프로브 다용성을 설명하며, 경량 프로브가 단일 활성화 벡터에서 공존하는 잠재 신호(예: 실제값 대 환영)를 어떻게 분리할 수 있는지 설명한다. 마지막으로, 우리는 추론 중에 이러한 양자화 실패를 효과적으로 탐지하고 수정하는 기하학적 일관성 검증 방법을 통해 이러한 통찰을 검증한다. 우리의 코드는 https://github.com/RL-MIND/Shape-of-Addition에서 확인할 수 있다.

English

Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between internal computation and discrete output. By analyzing the residual stream geometry during multi-operand addition, we identify the Iso-Raw-Sum Trajectory (IRST), a geometric structure where representations are anchored by semantic digits and modulated by continuous carry fibers. We propose the Noisy Quantization Model to explain this geometry, framing arithmetic errors as Geometric Slippages caused by internal neural noise pushing a continuous, latent Carry Potential across quantization thresholds. This geometric framework further elucidates Probe Versatility, explaining how lightweight probes can disentangle coexisting latent signals (such as ground truth versus hallucination) from a single activation vector. Finally, we validate these insights through a geometric consistency check method that effectively detects and corrects these quantization failures during inference. Our code is available at https://github.com/RL-MIND/Shape-of-Addition.