ChatPaper.aiChatPaper

超越人類數據:擴展自我訓練以解決問題與語言模型

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

December 11, 2023
作者: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel
cs.AI

摘要

在人類生成的數據上對語言模型(LMs)進行微調仍然是一種普遍的做法。然而,這些模型的性能通常受限於高質量人類數據的數量和多樣性。本文探討了在我們可以獲得標量反饋的任務上是否可以超越人類數據,例如在可以驗證正確性的數學問題上。為此,我們研究了一種基於期望-最大化的簡單自我訓練方法,我們稱之為ReST^{EM},其中我們(1)從模型中生成樣本並使用二元反饋對其進行篩選,(2)對這些樣本進行微調,然後(3)重複這個過程幾次。在使用PaLM-2模型對高級數學推理和應用編碼基準進行測試時,我們發現ReST^{EM}隨著模型大小的增加而有利地擴展,並明顯優於僅在人類數據上進行微調。總的來說,我們的研究結果表明,通過反饋進行自我訓練可以顯著減少對人類生成數據的依賴。
English
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST^{EM}, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST^{EM} scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.
PDF293December 15, 2024