러닝 바이 와이어 훈련 제어 거버넌스: 안정성과 효율성을 위한 스트레스 하의 제한적 자율 훈련

초록

현대 언어 모델 훈련은 높은 학습률, 확장 조건, 런타임 스트레스 환경에서 점점 더 불안정성, 성능 저하, 연산 자원 낭비에 노출되고 있다. 본 논문은 LBW-Guard(Learn-by-Wire Guard)를 제안하며, 이는 AdamW 위에서 동작하는 제한적 자율 훈련 제어 거버넌스 계층이다. LBW-Guard는 최적화기 업데이트 규칙을 대체하지 않고, 훈련 원격 측정 데이터를 관찰하고 불안정성에 민감한 영역을 해석하며, 고정된 훈련 목표를 유지하면서 최적화기 실행에 제한적 제어를 적용한다. LBW-Guard는 Qwen2.5를 중심으로 한 스트레스 및 견고성 평가 스위트에서 평가되었으며, WikiText-103 데이터셋을 사용하고 Qwen2.5-7B를 실증적 기준 모델로 삼았다. Qwen2.5-3B 및 Qwen2.5-14B와의 모델 크기 비교, 학습률 스트레스 테스트, 기울기 클리핑 기준선, LoRA를 사용하지 않은 TinyLlama-1B 전체 파라미터 정상 작동 점검을 수행하였다. 7B 기준 설정에서 LBW-Guard는 최종 퍼플렉서티를 13.21에서 10.74로 18.7% 개선했으며, 전체 소요 시간을 392.54초에서 357.02초로 1.10배 단축시켰다. 더 강한 학습률 스트레스 조건에서 AdamW는 LR=3e-3에서 최종 퍼플렉서티 1885.24, LR=1e-3에서 659.76으로 성능이 저하된 반면, LBW-Guard는 각각 11.57과 10.33으로 훈련 가능한 상태를 유지했다. 기울기 클리핑 기준선은 이러한 효과를 재현하지 못했다. 이러한 결과는 안정성에 민감한 LLM 훈련이 최적화기 상위의 관리 평면으로부터 이점을 얻을 수 있다는 범위 내 시스템 결론을 지지한다. LBW-Guard는 최적화기 대체 및 국소 기울기 억제와 구별되면서, 스트레스 조건에서 생산적 연산을 보존할 수 있는 제한적 런타임 제어의 증거를 제공한다.

English

Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.