二段階ルーティング混合専門家モデルを用いた300以上のタスクへの継続学習のスケーリング

要旨

近年，継続学習，特に事前学習モデル（PTM）に基づくクラス増分学習（CIL）は，多くの研究関心を集めている．しかしながら，非常に長いタスク系列にわたって安定性と可塑性を維持しつつ，識別的かつ包括的な特徴表現を効果的に学習する方法は，依然として未解決問題である．本稿では，効率的な二段階ルーティング混合エキスパート（BR-MoE）を備えたスケーラブルな継続学習器CaREを提案する．BR-MoEの核となるアイデアは，二段階のルーティング機構である．すなわち，動的に関連するタスク固有ルーターを活性化するルーター選択段階と，続いて識別的かつ包括的な表現をネットワークの中間層すべてに注入するために，動的にエキスパートを活性化・集約するエキスパートルーティングフェーズから成る．また，非常に長いタスク系列（数百タスク）におけるCIL性能評価のための挑戦的なデータセットOmniBenchmark-1Kを導入する．広範な実験により，CaREは一般的なCIL設定（5～20タスクなど）の標準CILデータセットを含む，多様なデータセットおよびタスク設定において最先端の性能を示すことが明らかになった．我々の知る限り，CaREは100から300を超える非重複タスクに及ぶ非常に長いタスク系列に対応可能な初の継続学習器であり，かつ当該タスク系列において全ベースラインを大きな差で凌駕する．本稿が，極めて長いタスク系列にわたる継続学習のさらなる研究の契機となることを期待する．コードとデータセットはhttps://github.com/LMMMEng/CaREで公開している．

English

Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representations while maintaining stability and plasticity over very long task sequences remains an open problem. We propose CaRE, a scalable {C}ontinual Le{a}rner with efficient Bi-Level {R}outing Mixture-of-{E}xperts (BR-MoE). The core idea of BR-MoE is a bi-level routing mechanism: a router selection stage that dynamically activates relevant task-specific routers, followed by an expert routing phase that dynamically activates and aggregates experts, aiming to inject discriminative and comprehensive representations into every intermediate network layer. On the other hand, we introduce a challenging dataset, OmniBenchmark-1K, for CIL performance evaluation on very long task sequences with hundreds of tasks. Extensive experiments show that CaRE demonstrates leading performance across a variety of datasets and task settings, including commonly used CIL datasets with classical CIL settings (e.g., 5-20 tasks). To the best of our knowledge, CaRE is the first continual learner that scales to very long task sequences (ranging from 100 to over 300 non-overlapping tasks), while outperforming all baselines by a large margin on such task sequences. We hope that this work will inspire further research into continual learning over extremely long task sequences. Code and dataset are publicly released at https://github.com/LMMMEng/CaRE.

二段階ルーティング混合専門家モデルを用いた300以上のタスクへの継続学習のスケーリング

Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts

要旨

Support