線控式學習訓練控制治理：壓力下的有界自主訓練以達穩定與效率

摘要

現代語言模型訓練日益面臨不穩定性、訓練品質下降以及算力浪費等問題，尤其是在激進的學習率設定、模型規模擴增與運行壓力環境下。本文提出「有線自主訓練護航機制」（Learn-by-Wire Guard, LBW-Guard），這是一個在 AdamW 之上運作、具備邊界限制的自動訓練控制治理層。LBW-Guard 不取代優化器的更新規則，而是透過觀測訓練過程的遙測數據，辨識對不穩定性敏感的狀態區域，並在維持固定訓練目標的前提下，對優化器的執行施加邊界限制的控制。我們以 Qwen2.5 為核心，在 WikiText-103 資料集上建立壓力與穩健性測試套件，進行 LBW-Guard 的評估。評估架構以 Qwen2.5-7B 作為實證基準，包含與 Qwen2.5-3B 及 Qwen2.5-14B 的模型規模比較、學習率壓力測試、梯度裁剪基線比較，以及無 LoRA 的 TinyLlama-1B 全參數完整性驗證。在 7B 參考設定中，LBW-Guard 將最終困惑度從 13.21 降至 10.74，改善幅度達 18.7%，同時將端到端訓練時間從 392.54 秒縮短至 357.02 秒，加速比為 1.10 倍。在更強的學習率壓力下，AdamW 在 LR=3e-3 時的最終困惑度退化至 1885.24，在 LR=1e-3 時為 659.76；而 LBW-Guard 則分別在 11.57 與 10.33 的困惑度下維持可訓練性。梯度裁剪基線無法重現此效果。這些結果支持一個具範圍限制的系統性結論：對不穩定性敏感的 LLM 訓練，可受惠於在優化器之上設置一層治理平面。LBW-Guard 提供了證據，證明具邊界限制的運行時控制，能在壓力下保留有效算力，同時與優化器取代及局部梯度抑制等方法保持明確區隔。

English

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.