マスターキー仮説：線形部分空間アライメントによるクロスモデル能力転移の解明

要旨

本研究では、学習済みの能力が再学習なしにモデル間で転移可能かどうか、特に異なる規模のモデル間での転移に焦点を当てて検証する。我々は、モデルの能力が低次元潜在部分空間内の特定の方向に対応し、それらが線形アライメントを通じてモデル間で転移可能であるとする「マスターキー仮説」を提唱する。この仮説に基づき、学習やラベルを必要としないフレームワークUNLOCKを提案する。UNLOCKは、能力を有するソースモデルと有さないソースモデルの活性化の差から能力方向を抽出し、低ランク線形変換でターゲットモデルにアライメントし、推論時に適用して所望の振る舞いを誘発する。連鎖思考（CoT）や数学的推論といった推論タスクによる実験では、学習なしでモデル規模を跨いだ大幅な改善が実証された。例えば、Qwen1.5-14BからQwen1.5-7BへCoT推論を転移するとMATHデータセットで12.1%の精度向上が得られ、Qwen3-4B-BaseからQwen3-14B-Baseへ数学的推論方向を転移するとAGIEval Mathの精度が61.1%から71.3%に向上し、14Bの学習済みモデルが達成した67.8%を上回った。分析により、転移の成功は事前学習で獲得された能力に依存し、本手法による介入が成功する推論経路へ出力分布を尖らせることで潜在能力を増幅することが示された。

English

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.

マスターキー仮説：線形部分空間アライメントによるクロスモデル能力転移の解明

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

要旨

Support