GeoStack：视觉语言模型中拟阿贝尔知识组合的框架

摘要

我们致力于解决视觉语言模型（VLM）中的知识组合难题——传统跨领域或多任务的知识积累往往引发灾难性遗忘。本文提出GeoStack（几何堆叠）模块化框架，可将独立训练的领域专家模型组合为统一模型。通过对适配器流形施加几何与结构约束，GeoStack能有效保全基础模型的原始知识。此外，我们通过数学推导证明了权重折叠特性，使得推理复杂度始终保持在常数级别（O(1)），与集成专家数量无关。在多领域适应和类增量学习的实验中，GeoStack在显著缓解灾难性遗忘的同时，为长期知识组合提供了高效机制。代码已开源：https://github.com/QuantitativeImagingLaboratory/GeoStack。

English

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.

GeoStack：视觉语言模型中拟阿贝尔知识组合的框架

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

摘要

Support