S0 튜닝: 하이브리드 순환-어텐션 모델의 오버헤드 없는 적응

초록

약 48개의 실행 검증된 HumanEval 훈련 솔루션을 사용하여 순환 계층당 단일 초기 상태 행렬을 튜닝할 경우(추론 오버헤드 제로), HumanEval에서 LoRA 대비 +10.8%p(p < 0.001) 더 높은 성능을 보였습니다. 우리가 S0 튜닝이라고 명명한 이 방법은 모든 모델 가중치는 고정한 채 순환 계층당 하나의 상태 행렬을 최적화합니다. Qwen3.5-4B(GatedDeltaNet 하이브리드)에서 S0 튜닝은 greedy pass@1을 +23.6 +/- 1.7%p(10 seeds) 향상시켰습니다. FalconH1-7B(Mamba-2 하이브리드)에서 S0은 71.8% +/- 1.3, LoRA는 71.4% +/- 2.4(3 seeds)를 기록하여 해당 표본 크기에서는 통계적으로 유의미한 차이가 없었으며, 가중치 병합이 전혀 필요하지 않았습니다. 크로스 도메인 전이는 MATH-500(+4.8%p, p = 0.00002, 8 seeds)과 GSM8K(+2.8%p, p = 0.0003, 10 seeds)에서 유의미했으며, 텍스트-to-SQL 벤치마크(Spider)에서는 전이가 관찰되지 않아 궤적 조정 메커니즘과 일관된 결과를 보였습니다. 순수 Transformer 모델(Qwen2.5-3B)에 대한 프리픽스 튜닝 대조 실험은 테스트된 9가지 구성 모두에서 -13.9%p의 성능 저하를 보였습니다. Qwen3.5에서 스텝별 상태 오프셋 변형은 +27.1%p로 S0과 LoRA 모두를 상회하지만, 스텝별 추론 비용이 발생합니다. 종합적으로, 이러한 결과는 검증된 감독 데이터가 부족한 상황에서 하이브리드 언어 모델에 대한 무추론-오버헤드 PEFT 방법으로서 순환 상태 초기화가 매우 효과적임을 보여줍니다. 튜닝된 상태는 약 48MB 파일이며, 작업 전환 시 가중치 병합이나 모델 재로드가 필요하지 않습니다. 코드 및 라이브러리: https://github.com/jackyoung27/s0-tuning.

English

Using roughly 48 execution-verified HumanEval training solutions, tuning a single initial state matrix per recurrent layer, with zero inference overhead, outperforms LoRA by +10.8 pp (p < 0.001) on HumanEval. The method, which we call S0 tuning, optimizes one state matrix per recurrent layer while freezing all model weights. On Qwen3.5-4B (GatedDeltaNet hybrid), S0 tuning improves greedy pass@1 by +23.6 +/- 1.7 pp (10 seeds). On FalconH1-7B (Mamba-2 hybrid), S0 reaches 71.8% +/- 1.3 and LoRA reaches 71.4% +/- 2.4 (3 seeds), statistically indistinguishable at this sample size while requiring no weight merging. Cross-domain transfer is significant on MATH-500 (+4.8 pp, p = 0.00002, 8 seeds) and GSM8K (+2.8 pp, p = 0.0003, 10 seeds); a text-to-SQL benchmark (Spider) shows no transfer, consistent with the trajectory-steering mechanism. A prefix-tuning control on a pure Transformer (Qwen2.5-3B) degrades performance by -13.9 pp under all nine configurations tested. On Qwen3.5, a per-step state-offset variant reaches +27.1 pp, above both S0 and LoRA but with per-step inference cost. Taken together, the results show that recurrent state initialization is a strong zero-inference-overhead PEFT surface for hybrid language models when verified supervision is scarce. The tuned state is a ~48 MB file; task switching requires no weight merging or model reload. Code and library: https://github.com/jackyoung27/s0-tuning.

S0 튜닝: 하이브리드 순환-어텐션 모델의 오버헤드 없는 적응

S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models

초록

Support