병렬 스케일링 법칙: 교차 언어적 관점을 통한 추론 일반화 규명

초록

최근 강화 사후 학습(Reinforcement Post-Training, RPT)의 발전으로 대규모 추론 모델(Large Reasoning Models, LRMs)의 능력이 크게 향상되었으며, 이는 RL 기반 추론의 일반화에 대한 관심을 더욱 높이고 있습니다. 기존 연구는 주로 작업이나 모달리티 간의 일반화를 탐구하는 데 초점을 맞추었지만, 본 연구는 추론 일반화를 조사하기 위한 새로운 교차 언어적 관점을 제안합니다. 이는 다음과 같은 중요한 질문을 제기합니다: 영어 RPT를 통해 달성된 추론 능력이 다른 언어로 효과적으로 전이될까요? 우리는 이를 위해 영어 중심의 LRMs를 다국어 추론 벤치마크에서 체계적으로 평가하고, 교차 언어 전이 가능성을 정량화하는 지표를 도입합니다. 연구 결과, 교차 언어 전이 가능성은 초기 모델, 대상 언어, 그리고 학습 패러다임에 따라 크게 달라지는 것으로 나타났습니다. 중재 연구를 통해, 초기 영어 능력이 강한 모델들이 영어 특정 패턴에 과도하게 의존하여 교차 언어 일반화가 감소하는 경향을 발견했습니다. 이를 해결하기 위해, 우리는 철저한 병렬 학습 연구를 수행했습니다. 실험 결과는 세 가지 주요 발견을 도출했습니다: 첫째, 단일 언어에서 단 하나의 병렬 언어로 전환할 때 발생하는 성능의 큰 도약인 'First-Parallel Leap', 둘째, 교차 언어 추론 전이가 학습된 병렬 언어의 수에 따라 멱법칙을 따르는 'Parallel Scaling Law', 그리고 실제 단일 언어 성능과 멱법칙 예측 간의 차이를 나타내는 'Monolingual Generalization Gap'입니다. 이는 영어 중심의 LRMs가 언어 간 완전한 일반화를 달성하지 못함을 나타냅니다. 본 연구는 LRM 추론이 인간 인지와 유사하다는 가정에 도전하며, 더 언어 중립적인 LRMs 개발을 위한 중요한 통찰을 제공합니다.

English

Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: Does the reasoning capability achieved from English RPT effectively transfer to other languages? We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: First-Parallel Leap, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable Parallel Scaling Law, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as Monolingual Generalization Gap, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.

병렬 스케일링 법칙: 교차 언어적 관점을 통한 추론 일반화 규명

Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective

초록

Support