대규모 언어 모델을 활용한 수학적 추론 학습의 스케일링 관계

초록

수학적 추론은 대형 언어 모델(LLM)에게 어려운 과제이며, LLM의 능력에 따른 스케일링 관계는 아직 충분히 탐구되지 않았다. 본 논문에서는 사전 학습 손실, 지도 학습 데이터 양, 증강 데이터 양이 지도 학습된 LLM의 추론 성능에 미치는 영향을 조사한다. 우리는 모델의 파라미터 수보다 사전 학습 손실이 모델 성능을 더 잘 나타내는 지표임을 발견했다. 다양한 양의 지도 학습 데이터를 사용하여 지도 미세 조정(SFT)을 적용한 결과, 데이터 양과 모델 성능 사이에 로그-선형 관계가 있음을 실증적으로 확인했으며, 더 좋은 모델은 지도 학습 데이터가 증가해도 성능 향상이 적다는 사실을 발견했다. 인간의 노력 없이 더 많은 데이터 샘플을 증강하여 모델 성능을 개선하기 위해, 우리는 거부 샘플링 미세 조정(Rejection sampling Fine-Tuning, RFT)을 제안한다. RFT는 지도 학습 모델을 사용하여 올바른 추론 경로를 생성하고 수집하여 증강 미세 조정 데이터셋으로 활용한다. 우리는 더 다양한 추론 경로를 포함하는 증강 샘플을 사용할 때 RFT가 LLM의 수학적 추론 성능을 더 크게 향상시킨다는 사실을 발견했다. 또한 RFT는 성능이 낮은 LLM에 더 큰 개선을 가져오는 것으로 나타났다. 더 나아가, 여러 모델의 거부 샘플을 결합하여 LLaMA-7B의 정확도를 49.3%로 끌어올렸으며, 이는 지도 미세 조정(SFT)의 정확도인 35.9%를 크게 능가하는 결과이다.

English

Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that pre-training loss is a better indicator of the model's performance than the model's parameter count. We apply supervised fine-tuning (SFT) with different amounts of supervised data and empirically find a log-linear relation between data amount and model performance, and we find better models improve less with enlarged supervised datasets. To augment more data samples for improving model performances without any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT uses supervised models to generate and collect correct reasoning paths as augmented fine-tuning datasets. We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs. We also find RFT brings more improvement for less performant LLMs. Furthermore, we combine rejection samples from multiple models which push LLaMA-7B to an accuracy of 49.3% and outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.

대규모 언어 모델을 활용한 수학적 추론 학습의 스케일링 관계

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

초록

Support