從架構角度重新思考持續學習中的穩定性-可塑性權衡

摘要

持續學習（Continual Learning, CL）的研究旨在賦予神經網絡逐步學習與適應的能力。這一追求的核心在於解決穩定性與可塑性之間的兩難困境，即如何在保留已學知識與獲取新知識之間取得平衡。儘管眾多CL方法致力於實現這一權衡，但它們往往忽視了網絡架構對穩定性與可塑性的影響，將權衡僅限於參數層面。本文深入探討了在架構層面上穩定性與可塑性之間的衝突。我們揭示，在相同參數約束下，更深的網絡展現出更好的可塑性，而更寬的網絡則具有更優的穩定性。為應對這一架構層面的難題，我們提出了一種名為Dual-Arch的新框架，作為CL的插件組件。該框架利用兩個獨立且互補的網絡的優勢：一個專注於可塑性，另一個則專注於穩定性。每個網絡都設計有專門且輕量化的架構，以適應其各自的目標。大量實驗表明，Dual-Arch不僅提升了現有CL方法的性能，而且在參數量上最多可縮減87%。

English

The quest for Continual Learning (CL) seeks to empower neural networks with the ability to learn and adapt incrementally. Central to this pursuit is addressing the stability-plasticity dilemma, which involves striking a balance between two conflicting objectives: preserving previously learned knowledge and acquiring new knowledge. While numerous CL methods aim to achieve this trade-off, they often overlook the impact of network architecture on stability and plasticity, restricting the trade-off to the parameter level. In this paper, we delve into the conflict between stability and plasticity at the architectural level. We reveal that under an equal parameter constraint, deeper networks exhibit better plasticity, while wider networks are characterized by superior stability. To address this architectural-level dilemma, we introduce a novel framework denoted Dual-Arch, which serves as a plug-in component for CL. This framework leverages the complementary strengths of two distinct and independent networks: one dedicated to plasticity and the other to stability. Each network is designed with a specialized and lightweight architecture, tailored to its respective objective. Extensive experiments demonstrate that Dual-Arch enhances the performance of existing CL methods while being up to 87% more compact in terms of parameters.

從架構角度重新思考持續學習中的穩定性-可塑性權衡

Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

摘要

Support