並列スケーリング則：言語横断的視点による推論汎化の解明

要旨

近年の強化学習による事後学習（Reinforcement Post-Training, RPT）の進展により、大規模推論モデル（Large Reasoning Models, LRMs）の能力が大幅に向上し、RLベースの推論の汎化に対する関心が高まっています。既存の研究は主に、タスクやモダリティを跨いだ汎化の調査に焦点を当ててきましたが、本研究では、推論の汎化を探るための新たなクロスリンガルな視点を提案します。これにより、重要な疑問が浮かび上がります：英語のRPTによって達成された推論能力は、他の言語に効果的に転移するのか？この疑問に対し、我々は英語中心のLRMsを多言語推論ベンチマークで体系的に評価し、クロスリンガル転移可能性を定量化する指標を導入します。その結果、クロスリンガル転移可能性は、初期モデル、対象言語、および学習パラダイムによって大きく異なることが明らかになりました。介入研究を通じて、初期の英語能力が強いモデルほど、英語固有のパターンに過度に依存し、クロスリンガルな汎化が低下する傾向があることがわかりました。この問題に対処するため、我々は徹底的な並列学習研究を実施しました。実験結果から、3つの重要な知見が得られました。第一に、「First-Parallel Leap」、つまり単一言語からたった一つの並列言語に移行するだけで性能が大幅に向上すること。第二に、予測可能な「Parallel Scaling Law」、つまりクロスリンガル推論転移が、学習する並列言語の数に応じてべき乗則に従うこと。さらに、実際の単一言語性能とべき乗則予測との乖離を「Monolingual Generalization Gap」として特定し、英語中心のLRMsが言語間で完全に汎化できないことを示しました。本研究は、LRMの推論が人間の認知を反映するという仮定に疑問を投げかけ、より言語に依存しないLRMsの開発に向けた重要な洞察を提供します。

English

Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: Does the reasoning capability achieved from English RPT effectively transfer to other languages? We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: First-Parallel Leap, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable Parallel Scaling Law, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as Monolingual Generalization Gap, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.

並列スケーリング則：言語横断的視点による推論汎化の解明

Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective

要旨

Support