S0チューニング: ハイブリッド回帰-注意モデルのゼロオーバーヘッド適応

要旨

約48個の実行検証済みHumanEvalトレーニングソリューションを使用し、リカレント層ごとに単一の初期状態行列をチューニングする手法（推論時のオーバーヘッドゼロ）は、HumanEvalにおいてLoRAを+10.8 pp上回りました（p < 0.001）。本手法をS0チューニングと呼び、すべてのモデル重みを固定したまま、リカレント層ごとに1つの状態行列を最適化します。Qwen3.5-4B（GatedDeltaNetハイブリッド）では、S0チューニングはグリーディpass@1を+23.6 +/- 1.7 pp改善しました（10シード）。FalconH1-7B（Mamba-2ハイブリッド）では、S0は71.8% +/- 1.3、LoRAは71.4% +/- 2.4に達し（3シード）、このサンプルサイズでは統計的に差がなく、重みマージも不要でした。ドメイン間転移は、MATH-500（+4.8 pp, p = 0.00002, 8シード）およびGSM8K（+2.8 pp, p = 0.0003, 10シード）で顕著でした。テキストto-SQLベンチマーク（Spider）では転移は見られず、軌道制御メカニズムと一致します。純粋なTransformer（Qwen2.5-3B）でのプレフィックスチューニングによる対照実験では、テストした9設定全てで性能が-13.9 pp低下しました。Qwen3.5では、ステップごとの状態オフセット変種が+27.1 ppに達し、S0とLoRAの両方を上回りましたが、ステップごとの推論コストがかかります。総合すると、検証済みの教師信号が乏しい場合、リカレント状態初期化はハイブリッド言語モデルにおいて、推論オーバーヘッドゼロの強力なPEFT手法であることが示されました。チューニングされた状態は約48 MBのファイルであり、タスク切り替えに重みマージやモデルの再読み込みは不要です。コードとライブラリ：https://github.com/jackyoung27/s0-tuning。

English

Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA reaches 71.4% +/- 2.4 (3 seeds), statistically indistinguishable at this sample size while requiring no weight merging. Cross-domain transfer is significant on MATH-500 (+4.8 pp, p = 0.00002, 8 seeds) and GSM8K (+2.8 pp, p = 0.0003, 10 seeds); a text-to-SQL benchmark (Spider) shows no transfer, consistent with the trajectory-steering mechanism. A prefix-tuning control on a pure Transformer (Qwen2.5-3B) degrades performance by -13.9 pp under all nine configurations tested. On Qwen3.5, a per-step state-offset variant reaches +27.1 pp, above both S0 and LoRA but with per-step inference cost. Taken together, the results show that recurrent state initialization is a strong zero-inference-overhead PEFT surface for hybrid language models when verified supervision is scarce. The tuned state is a ~48 MB file; task switching requires no weight merging or model reload. Code and library: https://github.com/jackyoung27/s0-tuning.

S0チューニング: ハイブリッド回帰-注意モデルのゼロオーバーヘッド適応

S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

要旨

Support