깊이의 한계: 잠재적 계획 발견에서 대규모 언어 모델의 한계에 관하여

초록

사고 연쇄(CoT) 모니터링의 실효성은 모델이 잠재 표현 내에서 효과적으로 추론하지 못하는 데 기반한다. 그러나 대규모 언어 모델(LLM)에서 이러한 잠재적 추론의 한계에 대해서는 거의 알려진 바가 없다. 우리는 중간 단계에 대한 감독 없이도 모델이 다단계 계획 전략을 발견하고, 단일 순전파 내에서 이를 잠재적으로 실행할 수 있는지를 연구하여 이러한 한계를 실험한다. 필요한 잠재적 계획 단계의 수를 정확히 제어하는 그래프 경로 탐색 과제를 사용하여, 대규모 스케일링으로도 해결되지 않는 현저한 한계를 발견했다: 처음부터 학습된 소형 트랜스포머는 최대 세 개의 잠재 단계가 필요한 전략을 발견하고, 미세 조정된 GPT-4o와 Qwen3-32B는 다섯 단계, GPT-5.4는 퓨샷 프롬프팅 하에서 일곱 단계에 도달했다. 훈련 중 모델이 학습할 수 있는 최대 잠재 계획 깊이는 5단계였으나, 발견된 전략은 테스트 시 최대 8개의 잠재 단계까지 일반화되었다. 이는 최종 답변 감독만으로 잠재 전략을 발견하는 능력과, 일단 발견된 전략을 실행하는 능력 사이의 분리를 보여준다. 유사한 한계가 보다 광범위하게 적용된다면, 여러 조정된 잠재 계획 단계를 요구하는 전략은 명시적으로 가르치거나 외부화해야 할 필요가 있으며, 이는 CoT 모니터링의 타당성을 지지하는 증거가 된다.

English

The viability of chain-of-thought (CoT) monitoring hinges on models being unable to reason effectively in their latent representations. Yet little is known about the limits of such latent reasoning in LLMs. We test these limits by studying whether models can discover multi-step planning strategies without supervision on intermediate steps and execute them latently, within a single forward pass. Using graph path-finding tasks that precisely control the number of required latent planning steps, we uncover a striking limitation unresolved by massive scaling: tiny transformers trained from scratch discover strategies requiring up to three latent steps, fine-tuned GPT-4o and Qwen3-32B reach five, and GPT-5.4 attains seven under few-shot prompting. Although the maximum latent planning depth models can learn during training is five, the discovered strategy generalizes up to eight latent steps at test-time. This reveals a dissociation between the ability to discover a latent strategy under final-answer supervision alone and the ability to execute it once discovered. If similar limits hold more broadly, strategies requiring multiple coordinated latent planning steps may need to be explicitly taught or externalized, lending credence to CoT monitoring.

깊이의 한계: 잠재적 계획 발견에서 대규모 언어 모델의 한계에 관하여

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

초록

Support