山羊：微調的LLaMA在算術任務上優於GPT-4

摘要

我們介紹了Goat，一個經過精細調整的LLaMA模型，在各種算術任務上明顯優於GPT-4。通過對一個合成生成的數據集進行精細調整，Goat在BIG-bench算術子任務上實現了最先進的性能。特別是，零編碼的Goat-7B與少編碼的PaLM-540B實現的準確性相匹敵甚至超越。令人驚訝的是，Goat僅通過監督式精細調整就能在大數字加法和減法上實現接近完美的準確性，這幾乎是不可能的，而以前的預訓練語言模型，如Bloom、OPT、GPT-NeoX等，無法實現。我們將Goat卓越的性能歸因於LLaMA對數字的一致標記化。為了應對更具挑戰性的任務，如大數字乘法和除法，我們提出了一種基於可學習性對任務進行分類的方法，並隨後通過利用基本算術原則，將無法學習的任務，如多位數乘法和除法，分解為一系列可學習的任務。我們對我們模型的性能進行了全面檢驗，提供了對我們提出的分解步驟有效性的全面評估。此外，Goat-7B可以在具有24GB VRAM GPU的LoRA上輕鬆訓練，有助於其他研究人員的可重現性。我們釋出了我們的模型、數據集以及用於數據集生成的Python腳本。

English

We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.

山羊：微調的LLaMA在算術任務上優於GPT-4

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

摘要

Support