从架构视角重新思考持续学习中的稳定性-可塑性权衡

摘要

持续学习（Continual Learning, CL）的研究旨在赋予神经网络逐步学习与适应的能力。这一探索的核心在于解决稳定性与可塑性之间的两难问题，即如何在保留已学知识的同时有效获取新知识之间找到平衡。尽管众多CL方法致力于实现这一权衡，但它们往往忽视了网络架构对稳定性和可塑性的影响，将权衡局限于参数层面。本文深入探讨了在架构层面上稳定性与可塑性之间的冲突，揭示出在同等参数约束下，更深的网络展现出更好的可塑性，而更宽的网络则具备更优的稳定性。为解决这一架构层面的难题，我们提出了一种名为Dual-Arch的新颖框架，作为CL的插件组件。该框架充分利用了两个独立且互补的网络的优势：一个专注于可塑性，另一个则致力于稳定性。每个网络均采用专门设计且轻量级的架构，以契合其特定目标。大量实验表明，Dual-Arch不仅提升了现有CL方法的性能，还在参数规模上实现了高达87%的压缩。

English

The quest for Continual Learning (CL) seeks to empower neural networks with the ability to learn and adapt incrementally. Central to this pursuit is addressing the stability-plasticity dilemma, which involves striking a balance between two conflicting objectives: preserving previously learned knowledge and acquiring new knowledge. While numerous CL methods aim to achieve this trade-off, they often overlook the impact of network architecture on stability and plasticity, restricting the trade-off to the parameter level. In this paper, we delve into the conflict between stability and plasticity at the architectural level. We reveal that under an equal parameter constraint, deeper networks exhibit better plasticity, while wider networks are characterized by superior stability. To address this architectural-level dilemma, we introduce a novel framework denoted Dual-Arch, which serves as a plug-in component for CL. This framework leverages the complementary strengths of two distinct and independent networks: one dedicated to plasticity and the other to stability. Each network is designed with a specialized and lightweight architecture, tailored to its respective objective. Extensive experiments demonstrate that Dual-Arch enhances the performance of existing CL methods while being up to 87% more compact in terms of parameters.

从架构视角重新思考持续学习中的稳定性-可塑性权衡

Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

摘要

Support