ChatPaper.aiChatPaper

迈向稳健高效的持续语言学习

Towards Robust and Efficient Continual Language Learning

July 11, 2023
作者: Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato
cs.AI

摘要

随着语言模型的应用领域不断发展,一个自然的问题是我们如何能够快速地将模型适应新任务。我们从持续学习的角度来探讨这个经典问题,我们的目标是继续微调在过去任务上训练的模型,以便在新任务上进行微调,从而“转移”相关知识。然而,这种策略也存在着带来更多害处的风险,即负迁移。在本文中,我们构建了一个新的基准任务序列,针对可能面临的不同转移场景,比如一系列具有积极转移潜力、负迁移潜力、无预期效果或两者混合的任务。一个理想的学习者应该能够最大程度地利用所有具有积极转移潜力的任务的信息,同时避免任何可能混淆它的分散注意力的任务所带来的负面影响。然后,我们提出了一个简单而有效的学习者,通过利用从过去任务检查点初始化新模型的选择性策略,满足了我们许多期望。然而,仍然存在一些限制,我们希望这个基准可以帮助社区进一步构建和分析这样的学习者。
English
As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any distracting tasks that may confuse it. We then propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints. Still, limitations remain, and we hope this benchmark can help the community to further build and analyze such learners.
PDF50December 15, 2024