ChatPaper.aiChatPaper

GeoStack:视觉语言模型中拟阿贝尔知识组合的框架

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

May 7, 2026
作者: Pranav Mantini, Shishir K. Shah
cs.AI

摘要

我们致力于解决视觉语言模型(VLM)中的知识组合难题——传统跨领域或多任务的知识积累往往引发灾难性遗忘。本文提出GeoStack(几何堆叠)模块化框架,可将独立训练的领域专家模型组合为统一模型。通过对适配器流形施加几何与结构约束,GeoStack能有效保全基础模型的原始知识。此外,我们通过数学推导证明了权重折叠特性,使得推理复杂度始终保持在常数级别(O(1)),与集成专家数量无关。在多领域适应和类增量学习的实验中,GeoStack在显著缓解灾难性遗忘的同时,为长期知识组合提供了高效机制。代码已开源:https://github.com/QuantitativeImagingLaboratory/GeoStack。
English
We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.
PDF11May 9, 2026