ChatPaper.aiChatPaper

RATIONALYST:用于改善推理的预训练过程监督

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

October 1, 2024
作者: Dongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme, Daniel Khashabi
cs.AI

摘要

由LLMs生成的推理步骤可能是不完整的,因为它们模仿了在预训练数据中常见的逻辑跳跃,这些逻辑跳跃在日常交流中很常见:基本原理经常被留下隐含(未声明)。为了解决这一挑战,我们引入了RATIONALYST,这是一个基于预训练的模型,用于对基于从未标记数据中提取的大量基本原理注释进行过程监督推理。我们从网络规模的未标记数据集(Pile)和一些推理数据集的组合中提取了79k个基本原理,几乎没有人为干预。这种用于推理的网络规模预训练使RATIONALYST能够在各种推理任务中保持一致的泛化能力,包括数学、常识、科学和逻辑推理。从LLaMa-3-8B进行微调,RATIONALYST在7个代表性推理基准测试中将推理准确性平均提高了3.9%。与GPT-4等规模显著更大的验证器相比,它还表现出更优越的性能,这些验证器是在匹配的训练集上进行微调的类似大小的模型。
English
The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.

Summary

AI-Generated Summary

PDF373November 16, 2024