語言模型的物理學:第2.2部分,如何從小學數學問題的錯誤中學習
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
August 29, 2024
作者: Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu
cs.AI
摘要
語言模型在解決推理任務方面表現出色;然而,即使是最強大的模型偶爾仍會出現推理錯誤。最近,積極的研究旨在提高推理準確性,特別是通過使用預訓練語言模型來通過多輪提示“自我校正”其錯誤。本文沿著這一研究方向,但專注於理解直接將“錯誤校正”數據納入預訓練階段的有用性。這些數據包括錯誤的解決步驟,緊接著是它們的更正。通過使用合成數學數據集,我們展示了令人鼓舞的結果:這種預訓練數據可以幫助語言模型直接實現更高的推理準確性(即通過簡單的自回歸,而無需多輪提示),相較於在相同量的無錯誤數據上進行預訓練。我們還深入探討了許多細節,例如(1)這種方法與束搜索的區別,(2)如何準備這樣的數據,(3)錯誤標記是否需要屏蔽,(4)所需的錯誤量,(5)這類數據是否可以推遲到微調階段等等。
English
Language models have demonstrated remarkable performance in solving reasoning
tasks; however, even the strongest models still occasionally make reasoning
mistakes. Recently, there has been active research aimed at improving reasoning
accuracy, particularly by using pretrained language models to "self-correct"
their mistakes via multi-round prompting. In this paper, we follow this line of
work but focus on understanding the usefulness of incorporating
"error-correction" data directly into the pretraining stage. This data consists
of erroneous solution steps immediately followed by their corrections. Using a
synthetic math dataset, we show promising results: this type of pretrain data
can help language models achieve higher reasoning accuracy directly (i.e.,
through simple auto-regression, without multi-round prompting) compared to
pretraining on the same amount of error-free data. We also delve into many
details, such as (1) how this approach differs from beam search, (2) how such
data can be prepared, (3) whether masking is needed on the erroneous tokens,
(4) the amount of error required, (5) whether such data can be deferred to the
fine-tuning stage, and many others.