GeoStack：视觉语言模型中拟阿贝尔知识组合的框架

摘要

我们致力于解决视觉语言模型（VLM）中的知识组合难题——传统跨领域或多任务的知识累积通常会导致灾难性遗忘。本文提出几何堆叠（GeoStack）这一模块化框架，通过将独立训练的领域专家模型组合成统一模型，在适配器流形上施加几何与结构约束，从而确保基础模型的底层知识得以完整保留。我们更通过数学证明揭示了权重折叠特性：无论集成专家数量如何，模型均能实现恒定时间复杂度的推理（O(1)）。在多领域适应与类增量学习的实验结果表明，GeoStack不仅能显著缓解灾难性遗忘，更为长期知识组合提供了高效机制。代码已开源：https://github.com/QuantitativeImagingLaboratory/GeoStack。

English

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.

GeoStack：视觉语言模型中拟阿贝尔知识组合的框架

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

摘要

Support