主密钥假说:通过线性子空间对齐解锁跨模型能力迁移
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
April 7, 2026
作者: Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu
cs.AI
摘要
我们研究了经过后训练获得的能力能否在不同模型间无需重新训练即可迁移,重点关注不同规模模型间的迁移。我们提出"主密钥假说",该假说认为模型能力对应着低维潜空间中的方向向量,这些方向能诱导特定行为模式,并可通过线性对齐在不同模型间迁移。基于此假说,我们提出UNLOCK框架——一种无需训练和标注的方法,通过对比具备特定能力与缺失该能力的源模型变体的激活差异来提取能力方向,借助低秩线性变换将其与目标模型对齐,并在推理时应用该方向来激发目标行为。在思维链推理和数学推理等任务上的实验表明,该方法能在不同规模模型间实现显著提升且无需训练。例如将思维链推理能力从Qwen1.5-14B迁移至Qwen1.5-7B时,在MATH数据集上准确率提升12.1%;将数学推理方向从Qwen3-4B-Base迁移至Qwen3-14B-Base后,AGIEval数学测试准确率从61.1%提升至71.3%,超越14B后训练模型67.8%的表现。分析表明迁移成功率取决于预训练阶段习得的基础能力,而我们的干预通过强化成功推理路径的输出概率分布,有效放大了潜在能力。
English
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.