언어 모델의 계층적 잠재 능력 발견: 인과적 표현 학습을 통한 접근

초록

언어 모델의 능력을 신뢰성 있게 평가하는 것은 모델 개발에 유용한 통찰을 도출하는 데 있어 매우 중요합니다. 그러나 이 분야에서 엄격한 인과적 평가는 복잡한 교란 효과와 광범위한 재훈련에 따른 과도한 계산 비용 등 상당한 방법론적 어려움에 직면해 있습니다. 이러한 문제를 해결하기 위해, 우리는 관측된 벤치마크 성능을 소수의 잠재적 능력 요인의 선형 변환으로 모델링하는 인과적 표현 학습 프레임워크를 제안합니다. 특히, 이러한 잠재 요인들은 기본 모델을 공통 교란 요인으로 적절히 통제한 후 인과적으로 상호 연관된 것으로 식별됩니다. Open LLM 리더보드의 6개 벤치마크에서 평가된 1500개 이상의 모델을 포함한 포괄적인 데이터셋에 이 접근법을 적용함으로써, 우리는 관측된 성능 변동을 신뢰성 있게 설명하는 간결한 3노드 선형 인과 구조를 발견했습니다. 이 인과 구조에 대한 추가 해석은 단순한 수치적 순위를 넘어 상당한 과학적 통찰을 제공합니다: 구체적으로, 우리는 일반적인 문제 해결 능력에서 시작하여 지시 따르기 숙련도를 거쳐 수학적 추론 능력으로 이어지는 명확한 인과적 방향성을 밝혀냈습니다. 우리의 결과는 평가 과정에서 기본 모델 변이를 신중히 통제하는 것이 잠재적 모델 능력 간의 근본적인 인과 관계를 정확히 파악하는 데 있어 필수적임을 강조합니다.

English

Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.

언어 모델의 계층적 잠재 능력 발견: 인과적 표현 학습을 통한 접근

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

초록

Support