GeoStack: VLMs에서의 준-아벨 지식 구성 프레임워크

초록

우리는 비전-언어 모델(VLM)에서 지식 구성을 위한 과제를 다루며, 다중 도메인이나 작업에 걸쳐 전문성을 축적할 때 일반적으로 발생하는 치명적 망각 문제를 해결하고자 한다. 본 논문에서는 독립적으로 훈련된 도메인 전문가 모듈이 통합 모델로 구성될 수 있는 모듈형 프레임워크인 GeoStack(기하학적 스택킹)을 소개한다. 어댑터 매니폴드에 기하학적 및 구조적 제약을 부과함으로써 GeoStack은 기본 모델의 기초 지식이 보존되도록 보장한다. 더 나아가 우리는 통합된 전문가 수와 무관하게 일정한 추론 복잡도(O(1))를 달성하는 가중치 폴딩 특성을 수학적으로 증명한다. 다중 도메인 적응 및 클래스 증분 학습에 대한 실험 결과는 GeoStack이 치명적 망각을 현저히 완감하면서 장기적인 지식 구성을 위한 효율적인 메커니즘을 제공함을 보여준다. 코드는 https://github.com/QuantitativeImagingLaboratory/GeoStack에서 확인할 수 있다.

English

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.

GeoStack: VLMs에서의 준-아벨 지식 구성 프레임워크

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

초록

Support