ChatPaper.aiChatPaper

山羊:经过微调的LLaMA在算术任务上胜过GPT-4

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

May 23, 2023
作者: Tiedong Liu, Bryan Kian Hsiang Low
cs.AI

摘要

我们介绍了Goat,这是一个经过微调的LLaMA模型,在一系列算术任务中明显优于GPT-4。在一个合成生成的数据集上进行微调后,Goat在BIG-bench算术子任务上实现了最先进的性能。特别是,零热启动的Goat-7B与少热启动的PaLM-540B实现的准确率相匹敌甚至超越。令人惊讶的是,Goat只通过监督微调就能在大数字加法和减法上实现接近完美的准确性,而以前的预训练语言模型(如Bloom、OPT、GPT-NeoX等)几乎无法做到这一点。我们将Goat的出色性能归因于LLaMA对数字的一致标记化。为了解决更具挑战性的任务,如大数字乘法和除法,我们提出了一种基于可学习性对任务进行分类的方法,并随后通过利用基本算术原理,将不可学习的任务(如多位数乘法和除法)分解为一系列可学习的任务。我们对模型的性能进行了彻底检查,提供了对我们提出的分解步骤有效性的全面评估。此外,Goat-7B可以在具有24GB VRAM GPU的LoRA上轻松训练,为其他研究人员提供了可重现性。我们发布了我们的模型、数据集以及用于数据集生成的Python脚本。
English
We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.
PDF55December 15, 2024