대규모 다중 모달 모델은 대규모 다중 모달 모델의 특징을 해석할 수 있습니다.

초록

대규모 다중모달 모델(LMMs)의 최근 발전은 학계와 산업 모두에서 중요한 진전을 이끌어내고 있습니다. 하나의 의문은 우리 인간으로서 이러한 모델의 내부 신경 표현을 어떻게 이해할 수 있는지입니다. 본 논문은 LMMs 내에서 의미를 식별하고 해석하기 위한 다재다능한 프레임워크를 제시함으로써 이 의문에 대한 초기 단계를 밟아갑니다. 구체적으로 1) 먼저 희소 오토인코더(SAE)를 적용하여 표현을 인간이 이해할 수 있는 특징들로 분리합니다. 2) 그런 다음 SAE에서 학습한 개방적 의미 특징들을 LMMs 자체에 의해 해석하는 자동 해석 프레임워크를 제시합니다. 우리는 이 프레임워크를 활용하여 LLaVA-NeXT-8B 모델을 LLaVA-OV-72B 모델을 사용하여 분석하였으며, 이러한 특징들이 모델의 행동을 효과적으로 조절할 수 있음을 입증하였습니다. 우리의 결과는 LMMs가 특정 작업에서 뛰어난 이유와 EQ 테스트를 포함한 작업에서의 실수의 본질을 명확히 하고, 그들의 교정을 위한 잠재적 전략을 제시함으로써 보다 심층적인 이해를 제공합니다. 이러한 발견은 LMMs의 내부 메커니즘에 대한 새로운 통찰을 제공하며, 인간 뇌의 인지 과정과 유사점을 제시합니다.

English

Recent advances in Large Multimodal Models (LMMs) lead to significant breakthroughs in both academia and industry. One question that arises is how we, as humans, can understand their internal neural representations. This paper takes an initial step towards addressing this question by presenting a versatile framework to identify and interpret the semantics within LMMs. Specifically, 1) we first apply a Sparse Autoencoder(SAE) to disentangle the representations into human understandable features. 2) We then present an automatic interpretation framework to interpreted the open-semantic features learned in SAE by the LMMs themselves. We employ this framework to analyze the LLaVA-NeXT-8B model using the LLaVA-OV-72B model, demonstrating that these features can effectively steer the model's behavior. Our results contribute to a deeper understanding of why LMMs excel in specific tasks, including EQ tests, and illuminate the nature of their mistakes along with potential strategies for their rectification. These findings offer new insights into the internal mechanisms of LMMs and suggest parallels with the cognitive processes of the human brain.

대규모 다중 모달 모델은 대규모 다중 모달 모델의 특징을 해석할 수 있습니다.

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

초록

Support