언어 모델 신뢰성의 최소 탐침으로서의 계수 능력

초록

대규모 언어 모델은 수학적 추론, 코딩, 문서 분석 벤치마크에서 강력한 성능을 보여주며, 광범위한 지시 따르기 능력을 시사합니다. 그러나 이러한 성공이 일반적인 논리적 역량을 반영하는지, 학습된 절차의 반복적 적용인지, 아니면 규칙 실행을 모방한 패턴 매칭인지는 여전히 명확하지 않습니다. 우리는 이 문제를 규명하기 위해 모델이 반복되는 기호를 실패할 때까지 세는 검사법인 안정적 계수 능력(Stable Counting Capacity)을 도입했습니다. 이 검사법은 평가에서 지식 의존성, 의미론 및 모호성을 제거하고 어휘 및 토큰화로 인한 혼란을 피하며, 표준 지식 기반 벤치마크를 넘어 절차적 신뢰성에 대한 직접적인 측정을 제공합니다. 본 연구에서는 100개 이상의 모델 변형을 대상으로 안정적 계수 능력이 광고된 컨텍스트 한계보다 훨씬 낮게 유지됨을 보여줍니다. 모델의 행동은 개방형 논리나 학습된 규칙의 안정적인 적용과도 일치하지 않으며, 손가락으로 세는 것과 유사한 유한한 계수형 내부 상태 집합의 사용과 일치합니다. 이 자원이 고갈되면 규칙 따르기의 겉모습은 사라지고 정확한 실행은 추측으로 붕괴되며, 추가적인 테스트 시간 연산 자원을 제공해도 마찬가지입니다. 이러한 결과는 현재 언어 모델의 유창한 성능이 일반적이고 신뢰할 수 있는 규칙 따르기를 보장하지 않음을 보여줍니다.

English

Large language models perform strongly on benchmarks in mathematical reasoning, coding and document analysis, suggesting a broad ability to follow instructions. However, it remains unclear whether such success reflects general logical competence, repeated application of learned procedures, or pattern matching that mimics rule execution. We investigate this question by introducing Stable Counting Capacity, an assay in which models count repeated symbols until failure. The assay removes knowledge dependencies, semantics and ambiguity from evaluation, avoids lexical and tokenization confounds, and provides a direct measure of procedural reliability beyond standard knowledge-based benchmarks. Here we show, across more than 100 model variants, that stable counting capacity remains far below advertised context limits. Model behavior is consistent neither with open-ended logic nor with stable application of a learned rule, but instead with use of a finite set of count-like internal states, analogous to counting on fingers. Once this resource is exhausted, the appearance of rule following disappears and exact execution collapses into guessing, even with additional test-time compute. These findings show that fluent performance in current language models does not guarantee general, reliable rule following.

언어 모델 신뢰성의 최소 탐침으로서의 계수 능력

Counting as a minimal probe of language model reliability

초록

Support