ChatPaper.aiChatPaper

语言模型的物理学:第二部(下)——如何从小学数学题的错误中学习

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

August 29, 2024
作者: Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu
cs.AI

摘要

语言模型在解决推理任务方面展现出卓越性能,但即便是最强大的模型仍会偶尔出现推理错误。近期研究热点聚焦于通过多轮提示使预训练语言模型实现"自我修正",以此提升推理准确率。本文沿袭这一研究方向,重点探讨在预训练阶段直接引入"纠错"数据的有效性——这类数据由包含错误解的步骤及其后续修正组成。基于合成数学数据集的实验表明:相较于使用等量无错误数据预训练,这种预训练方式能帮助语言模型直接(即通过简单自回归而非多轮提示)获得更高的推理准确率。我们还深入探讨了诸多细节:(1)该方法与束搜索的区别;(2)此类数据的制备方式;(3)错误标记是否需要掩码处理;(4)所需错误量的阈值;(5)此类数据能否延至微调阶段使用等关键问题。
English
Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.
PDF272November 14, 2024