从架构视角重新思考持续学习中的稳定性-可塑性权衡
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective
June 4, 2025
作者: Aojun Lu, Hangjie Yuan, Tao Feng, Yanan Sun
cs.AI
摘要
持续学习(Continual Learning, CL)的研究旨在赋予神经网络逐步学习与适应的能力。这一探索的核心在于解决稳定性与可塑性之间的两难问题,即如何在保留已学知识的同时有效获取新知识之间找到平衡。尽管众多CL方法致力于实现这一权衡,但它们往往忽视了网络架构对稳定性和可塑性的影响,将权衡局限于参数层面。本文深入探讨了在架构层面上稳定性与可塑性之间的冲突,揭示出在同等参数约束下,更深的网络展现出更好的可塑性,而更宽的网络则具备更优的稳定性。为解决这一架构层面的难题,我们提出了一种名为Dual-Arch的新颖框架,作为CL的插件组件。该框架充分利用了两个独立且互补的网络的优势:一个专注于可塑性,另一个则致力于稳定性。每个网络均采用专门设计且轻量级的架构,以契合其特定目标。大量实验表明,Dual-Arch不仅提升了现有CL方法的性能,还在参数规模上实现了高达87%的压缩。
English
The quest for Continual Learning (CL) seeks to empower neural networks with
the ability to learn and adapt incrementally. Central to this pursuit is
addressing the stability-plasticity dilemma, which involves striking a balance
between two conflicting objectives: preserving previously learned knowledge and
acquiring new knowledge. While numerous CL methods aim to achieve this
trade-off, they often overlook the impact of network architecture on stability
and plasticity, restricting the trade-off to the parameter level. In this
paper, we delve into the conflict between stability and plasticity at the
architectural level. We reveal that under an equal parameter constraint, deeper
networks exhibit better plasticity, while wider networks are characterized by
superior stability. To address this architectural-level dilemma, we introduce a
novel framework denoted Dual-Arch, which serves as a plug-in component for CL.
This framework leverages the complementary strengths of two distinct and
independent networks: one dedicated to plasticity and the other to stability.
Each network is designed with a specialized and lightweight architecture,
tailored to its respective objective. Extensive experiments demonstrate that
Dual-Arch enhances the performance of existing CL methods while being up to 87%
more compact in terms of parameters.