LayerRoute: 通过LoRA微调实现输入条件自适应的层跳跃,用于智能体语言模型
LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models
June 1, 2026
作者: Prateek Kumar Sikdar
cs.AI
摘要
智能体语言模型系统交替使用两种结构不同的步骤类型:结构化工具调用(短、确定性、低困惑度)与开放式规划/推理步骤(长、复杂、高困惑度)。尽管存在这种异质性,当前推理系统对每个步骤应用相同的计算量。我们提出 LayerRoute,一种轻量级适配器,能够基于每个输入学习选择性跳过 Transformer 模块。LayerRoute 为 Qwen2.5-0.5B-Instruct 中的每个 Transformer 模块(共24层)添加:(1) 一个逐层路由器(约897个参数,Linear(896,1)),通过直通估计器输出硬二值门控;(2) 注意力投影 Q/K/V/O 上的 LoRA 适配器(秩为8,约108万个参数)。骨干网络权重保持冻结。在智能体数据(Hermes、Glaive、GSM8K、Turing)上进行单次端到端训练,并加入门控正则化项,迫使系统发现每个输入类型中哪些模块可跳过。经过3000步训练(在A100 40GB上耗时6.4分钟),LayerRoute 实现12.91%的跳过差异:工具调用跳过15.25%的FLOPs,而规划步骤仅跳过2.34%,仅使用110万个可训练参数(占494M骨干网络的0.22%)。由于LoRA适配,模型质量相较于基线模型有所提升,工具调用和规划步骤的困惑度差值分别为-1.29和-1.30。
English
Agentic language model systems alternate between two structurally distinct step types: structured tool calls (short, deterministic, low perplexity) and open-ended planning/reasoning steps (long, complex, high perplexity). Despite this heterogeneity, current inference systems apply identical compute to every step. We introduce LayerRoute, a lightweight adapter that learns to selectively skip transformer blocks on a per-input basis. LayerRoute augments each of the 24 transformer blocks in Qwen2.5-0.5B-Instruct with: (1) a per-layer router (~897 parameters, Linear(896,1)) that outputs a hard binary gate via the straight-through estimator, and (2) LoRA adapters (rank 8, ~1.08M parameters) on the Q/K/V/O attention projections. The backbone weights remain frozen. A single end-to-end training pass on agentic data (Hermes, Glaive, GSM8K, Turing) with a gate regularisation term forces the system to discover which blocks are skippable per input type. After 3,000 steps (6.4 minutes on an A100 40GB), LayerRoute achieves a 12.91% skip differential: tool calls skip 15.25% of FLOPs while planning steps skip only 2.34%, using only 1.10M trainable parameters (0.22% of the 494M backbone). Quality improves over the base model due to LoRA adaptation, with perplexity delta of -1.29 on tool calls and -1.30 on planning.