**Arcee Trinity 大型技术报告**
Arcee Trinity Large Technical Report
February 19, 2026
作者: Varun Singh, Lucas Krauss, Sami Jaghouar, Matej Sirovatka, Charles Goddard, Fares Obied, Jack Min Ong, Jannik Straube, Fern, Aria Harley, Conner Stewart, Colin Kealty, Maziyar Panahi, Simon Kirsten, Anushka Deshpande, Anneketh Vij, Arthur Bresnu, Pranav Veldurthi, Raghav Ravishankar, Hardik Bishnoi, DatologyAI Team, Arcee AI Team, Prime Intellect Team, Mark McQuade, Johannes Hagemann, Lucas Atkins
cs.AI
摘要
本文发布Arcee Trinity Large的技术报告,该模型为稀疏专家混合模型,总参数量达4000亿,每个令牌激活130亿参数。同时我们报告了Trinity Nano与Trinity Mini的性能:Trinity Nano总参数60亿(每令牌激活10亿),Trinity Mini总参数260亿(每令牌激活30亿)。这些模型采用现代架构设计,包含交错局部与全局注意力机制、门控注意力、深度缩放三明治归一化,以及专家混合模型的Sigmoid路由算法。针对Trinity Large,我们还引入了名为"软钳位动量专家偏置更新"的新型MoE负载均衡策略。所有模型均采用Muon优化器完成训练,且整个训练过程零损失突增。Trinity Nano与Trinity Mini在10万亿令牌上完成预训练,Trinity Large则在17万亿令牌上完成预训练。模型检查点已发布于https://huggingface.co/arcee-ai。
English
We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion tokens, and Trinity Large was pre-trained on 17 trillion tokens. The model checkpoints are available at https://huggingface.co/arcee-ai.