마스터 키 가설: 선형 부분공간 정렬을 통한 교차 모델 능력 전이 해제

초록

사후 훈련된 능력이 재훈련 없이 모델 간에 전이될 수 있는지 여부를 조사하며, 특히 서로 다른 규모의 모델 간 전이에 중점을 둡니다. 우리는 모델 능력이 특정 행동을 유발하는 저차원 잠재 부분공간 내 방향에 해당하며, 선형 정렬을 통해 모델 간에 전이 가능하다는 '마스터 키 가설(Master Key Hypothesis)'을 제안합니다. 이 가설에 기반하여, 우리는 능력이 있는 소스 모델 변형과 능력이 없는 소스 모델 변형 간의 활성화 차이를 통해 능력 방향을 추출하고, 저순위 선형 변환을 통해 타겟 모델과 정렬하며, 추론 시점에 이를 적용하여 해당 행동을 이끌어내는 훈련 불필요 및 레이블 불필요 프레임워크인 UNLOCK을 소개합니다. Chain-of-Thought(CoT) 및 수학적 추론을 포함한 추론 행동에 대한 실험 결과, 훈련 없이도 모델 규모에 걸쳐 상당한 성능 향상을 확인했습니다. 예를 들어, Qwen1.5-14B의 CoT 추론 능력을 Qwen1.5-7B로 전이하면 MATH 데이터셋에서 12.1%의 정확도 향상을 보였으며, Qwen3-4B-Base의 수학적 추론 방향을 Qwen3-14B-Base로 전이하면 AGIEval Math 정확도가 61.1%에서 71.3%로 향상되어 사후 훈련된 14B 모델이 달성한 67.8%를 능가했습니다. 우리의 분석에 따르면, 전이의 성공은 사전 훈련 중 학습된 능력에 의존하며, 우리의 개입은 성공적인 추론 경로로 출력 분포를 선명하게 함으로써 잠재 능력을 증폭시키는 것으로 나타났습니다.

English

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.

마스터 키 가설: 선형 부분공간 정렬을 통한 교차 모델 능력 전이 해제

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

초록

Support