教师LM:教会捕鱼胜过施予鱼,语言建模同理
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
October 29, 2023
作者: Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan
cs.AI
摘要
大型语言模型(LLMs)在各种自然语言处理任务中展现出令人印象深刻的推理和数据增强能力。但是,小型模型呢?在这项工作中,我们提出了TeacherLM-7.1B,能够为大多数自然语言处理样本注释相关基础知识、思维链和常见错误,使注释不仅仅是一个答案,从而让其他模型学会“为什么”而不仅仅是“什么”。TeacherLM-7.1B模型在MMLU上取得了52.3的零-shot得分,超过了大多数具有超过100B参数的模型。更令人瞩目的是它的数据增强能力。基于TeacherLM-7.1B,我们增强了58个自然语言处理数据集,并在多任务设置中使用来自OPT和BLOOM系列的不同参数的各种学生模型进行了教学。实验结果表明,TeacherLM提供的数据增强带来了显著的好处。我们将以开源形式发布TeacherLM系列模型和增强的数据集。
English
Large Language Models (LLMs) exhibit impressive reasoning and data
augmentation capabilities in various NLP tasks. However, what about small
models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant
fundamentals, chain of thought, and common mistakes for most NLP samples, which
makes annotation more than just an answer, thus allowing other models to learn
"why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot
score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even
more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we
augmented 58 NLP datasets and taught various student models with different
parameters from OPT and BLOOM series in a multi-task setting. The experimental
results indicate that the data augmentation provided by TeacherLM has brought
significant benefits. We will release the TeacherLM series of models and
augmented datasets as open-source.