트랜스포머 모델에서 내재적 연역 추론의 스케일링 특성

초록

우리는 깊이 제한 트랜스포머에서 Horn 절에 대한 암묵적 연역 추론의 확장 특성을 연구한다. 증명 가능성을 허위 특징과 체계적으로 분리하고 알고리즘 정렬을 강제함으로써, 충분히 깊은 양방향 프리픽스 마스크 모델에서 암묵적 추론이 그래프 토폴로지와 문제 폭에 걸쳐 명시적 CoT 성능에 접근함을 발견했으나, 깊이 외삽을 위해서는 CoT가 여전히 필요하다.

English

We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.

트랜스포머 모델에서 내재적 연역 추론의 스케일링 특성

The Scaling Properties of Implicit Deductive Reasoning in Transformers

초록

Support