GeoStack:视觉语言模型中拟阿贝尔知识组合的框架
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
May 7, 2026
作者: Pranav Mantini, Shishir K. Shah
cs.AI
摘要
我们致力于解决视觉语言模型(VLM)中的知识组合难题——传统跨领域或多任务的知识累积通常会导致灾难性遗忘。本文提出几何堆叠(GeoStack)这一模块化框架,通过将独立训练的领域专家模型组合成统一模型,在适配器流形上施加几何与结构约束,从而确保基础模型的底层知识得以完整保留。我们更通过数学证明揭示了权重折叠特性:无论集成专家数量如何,模型均能实现恒定时间复杂度的推理(O(1))。在多领域适应与类增量学习的实验结果表明,GeoStack不仅能显著缓解灾难性遗忘,更为长期知识组合提供了高效机制。代码已开源:https://github.com/QuantitativeImagingLaboratory/GeoStack。
English
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.