并行扩展定律:跨语言视角下的推理泛化能力揭示
Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective
October 2, 2025
作者: Wen Yang, Junhong Wu, Chong Li, Chengqing Zong, Jiajun Zhang
cs.AI
摘要
近期,强化后训练(Reinforcement Post-Training, RPT)的进展显著提升了大规模推理模型(Large Reasoning Models, LRMs)的能力,激发了人们对基于强化学习的推理泛化能力的浓厚兴趣。尽管现有研究主要集中于探讨其跨任务或跨模态的泛化性,本研究提出了一种新颖的跨语言视角来探究推理泛化。这引发了一个关键问题:通过英语RPT获得的推理能力能否有效迁移至其他语言?我们通过系统评估以英语为中心的LRMs在多语言推理基准上的表现,并引入一个量化跨语言迁移性的指标来解答这一问题。研究发现,跨语言迁移性在初始模型、目标语言及训练范式间存在显著差异。通过干预性研究,我们发现初始英语能力更强的模型往往过度依赖英语特有的模式,导致跨语言泛化能力下降。为解决这一问题,我们进行了深入的并行训练研究。实验结果揭示了三个关键发现:首先是“首次并行飞跃”,即从单语言训练转向仅增加一种并行语言时性能的显著提升;其次是可预测的“并行扩展定律”,表明跨语言推理迁移遵循训练并行语言数量的幂律关系;此外,我们识别出实际单语言性能与幂律预测之间的差异为“单语言泛化鸿沟”,表明以英语为中心的LRMs未能完全实现跨语言泛化。本研究挑战了LRM推理模拟人类认知的假设,为开发更具语言通用性的LRMs提供了重要洞见。
English
Recent advancements in Reinforcement Post-Training (RPT) have significantly
enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased
interest in the generalization of RL-based reasoning. While existing work has
primarily focused on investigating its generalization across tasks or
modalities, this study proposes a novel cross-linguistic perspective to
investigate reasoning generalization. This raises a crucial question:
Does the reasoning capability achieved from English RPT effectively
transfer to other languages? We address this by systematically evaluating
English-centric LRMs on multilingual reasoning benchmarks and introducing a
metric to quantify cross-lingual transferability. Our findings reveal that
cross-lingual transferability varies significantly across initial model, target
language, and training paradigm. Through interventional studies, we find that
models with stronger initial English capabilities tend to over-rely on
English-specific patterns, leading to diminished cross-lingual generalization.
To address this, we conduct a thorough parallel training study. Experimental
results yield three key findings: First-Parallel Leap, a substantial
leap in performance when transitioning from monolingual to just a single
parallel language, and a predictable Parallel Scaling Law, revealing
that cross-lingual reasoning transfer follows a power-law with the number of
training parallel languages. Moreover, we identify the discrepancy between
actual monolingual performance and the power-law prediction as
Monolingual Generalization Gap, indicating that English-centric LRMs
fail to fully generalize across languages. Our study challenges the assumption
that LRM reasoning mirrors human cognition, providing critical insights for the
development of more language-agnostic LRMs.