线控学习训练控制治理：压力下有界自主训练以实现稳定性与效率

摘要

现代语言模型训练越来越频繁地面临不稳定性、性能降级和计算浪费的问题，尤其是在激进的学习率、规模以及运行时压力条件下。本文提出了线控学习防护（LBW-Guard），一种在AdamW之上运行的自治训练控制治理层。LBW-Guard不替换优化器的更新规则，而是观测训练遥测数据，识别不稳定敏感状态，并在保持固定训练目标的同时，对优化器的执行施加有界控制。我们在以Qwen2.5为核心的应力与鲁棒性测试套件上（基于WikiText-103数据集）评估了LBW-Guard。其中以Qwen2.5-7B作为经验锚点，通过与Qwen2.5-3B和Qwen2.5-14B进行模型规模对比，并结合学习率应力测试、梯度裁剪基线以及一个无LoRA的TinyLlama-1B全参数完整性检查。在7B参考设置下，LBW-Guard将最终困惑度从13.21降至10.74，提升了18.7%，同时端到端时间从392.54秒缩短至357.02秒，实现了1.10倍的加速。在更强的学习率应力条件下，AdamW在LR=3e-3时困惑度退化至1885.24，在LR=1e-3时退化至659.76，而LBW-Guard分别保持在11.57和10.33的可训练水平。梯度裁剪基线无法复现这一效果。这些结果支持一个特定系统层面的结论：对稳定性敏感的LLM训练可以从优化器之上的治理层中获益。LBW-Guard提供了证据，表明有界运行时控制能够在应力条件下保持有效计算，同时区别于替换优化器或局部梯度抑制的方法。

English

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.