Learn-by-Wire トレーニング制御ガバナンス：安定性と効率性のためのストレス下における制約付き自律訓練

要旨

近年の言語モデルの訓練は、特に大きな学習率、大規模スケール、実行時のストレス条件下で、不安定性、劣化した実行、無駄な計算資源にさらされることが増えている。本論文では、AdamWの上位に位置する有界な自律的訓練制御のガバナンス層であるLearn-by-Wire Guard（LBW-Guard）を提案する。LBW-Guardは最適化器の更新ルールを置き換えるのではなく、訓練テレメトリを観測し、不安定性に敏感な領域を解釈し、固定された訓練目的を維持しながら最適化器の実行に有界な制御を適用する。我々は、WikiText-103を用い、Qwen2.5を中心としたストレス・ロバストネス評価スイートにおいてLBW-Guardを評価する。実験では、Qwen2.5-7Bを経験的基準とし、Qwen2.5-3BおよびQwen2.5-14Bとのモデルサイズ比較、学習率ストレステスト、勾配クリッピングのベースライン、さらにLoRAを用いないTinyLlama-1Bの全パラメータ健全性チェックを実施した。7Bの参照設定では、LBW-Guardは最終パープレキシティを13.21から10.74へと18.7%改善し、エンドツーエンドの時間を392.54秒から357.02秒へと短縮し、1.10倍の高速化を達成した。より強い学習率ストレス下では、AdamWはLR=3e-3で最終パープレキシティ1885.24、LR=1e-3で659.76に劣化するのに対し、LBW-Guardはそれぞれ11.57および10.33で訓練可能な状態を維持した。勾配クリッピングのベースラインではこの効果は再現されない。これらの結果は、安定性に敏感なLLM訓練が、最適化器の上位に統制層を設けることで恩恵を受け得るという、スコープを限定したシステム上の結論を支持する。LBW-Guardは、有界な実行時制御がストレス下でも生産的な計算資源の利用を維持できる一方で、最適化器の置き換えや局所的な勾配抑制とは区別されることを示す証拠を提供する。

English

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.