達爾文家族：基於MRI信任加權的演化合併實現語言模型推理的免訓練擴展

摘要

我們提出Darwin Family，這是一個透過無梯度權重空間重組、無需訓練即可演化合併大型語言模型的框架。我們探討是否能在不進行額外訓練的情況下，透過重新組織現有檢查點中已編碼的潛在能力，來提升前沿級推理表現。Darwin引入三個關鍵概念：(i) 一個14維的自適應合併基因組，能實現細粒度的組件與區塊級重組；(ii) MRI-Trust Fusion，透過可學習的信任參數，自適應地平衡診斷性層重要性訊號與演化搜索；以及 (iii) 架構映射器 (Architecture Mapper)，能實現異質模型家族之間的跨架構培育。實驗上，旗艦模型Darwin-27B-Opus在GPQA Diamond上達到86.9%的準確率，在1,252個受評模型中排名第6，且在不使用任何梯度訓練的情況下，表現超越其經過完整訓練的基礎模型。在4B到35B參數的規模範圍內，Darwin模型持續優於其父代，支援遞迴多世代演化，並能實現結合Transformer與Mamba元件的免訓練演化合併。整體而言，Darwin Family證明，對於以推理為中心的語言模型，診斷引導的演化合併是一種可實作且可重現的替代方案，可取代成本高昂的後訓練流程。

English

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.