ChatPaper.aiChatPaper

TeacherLM:教人釣魚勝於授人以魚,語言建模亦然。

TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise

October 29, 2023
作者: Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan
cs.AI

摘要

大型語言模型(LLMs)展現出在各種自然語言處理任務中令人印象深刻的推理和資料擴增能力。然而,小型模型呢?在這項研究中,我們提出了TeacherLM-7.1B,能夠為大多數自然語言處理樣本註釋相關基礎知識、思維鏈和常見錯誤,使註釋不僅僅是一個答案,從而讓其他模型學習“為什麼”而不僅僅是“什麼”。TeacherLM-7.1B 模型在 MMLU 上實現了 52.3 的零-shot 分數,超越了大多數具有超過 100B 參數的模型。更為顯著的是它的資料擴增能力。基於 TeacherLM-7.1B,我們擴增了 58 個自然語言處理數據集,並在多任務設置中使用 OPT 和 BLOOM 系列的各種不同參數教授了各種學生模型。實驗結果表明,TeacherLM 提供的資料擴增帶來了顯著的好處。我們將以開源形式發布 TeacherLM 系列模型和擴增數據集。
English
Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.
PDF93December 15, 2024