大規模なマルチモーダルモデルは、大規模なマルチモーダルモデル内の特徴を解釈できる可能性があります。

要旨

最近の大規模多モーダルモデル（LMMs）の進歩により、学術界と産業の両方で重要な突破口が生まれています。その中で生じる疑問の1つは、我々人間がそれらの内部ニューラル表現を理解する方法です。本論文は、LMMs内の意味を特定し解釈するための汎用フレームワークを提案することで、この問いに取り組む初歩的な一歩を踏み出します。具体的には、1）まず、スパースオートエンコーダ（SAE）を適用して表現を人間が理解できる特徴に分解します。2）次に、SAEで学習されたオープンセマンティック特徴をLMMs自体によって解釈するための自動解釈フレームワークを提示します。我々は、このフレームワークを用いて、LLaVA-NeXT-8BモデルをLLaVA-OV-72Bモデルを用いて分析し、これらの特徴がモデルの振る舞いを効果的に誘導できることを示します。我々の結果は、LMMsが特定のタスク、EQテストを含む、でなぜ優れているのか、そして彼らの誤りの本質とそれらを修正するための潜在的戦略を明らかにすることで、より深い理解に貢献します。これらの知見は、LMMsの内部メカニズムについての新たな洞察を提供し、人間の脳の認知プロセスとの類似点を示唆しています。

English

Recent advances in Large Multimodal Models (LMMs) lead to significant breakthroughs in both academia and industry. One question that arises is how we, as humans, can understand their internal neural representations. This paper takes an initial step towards addressing this question by presenting a versatile framework to identify and interpret the semantics within LMMs. Specifically, 1) we first apply a Sparse Autoencoder(SAE) to disentangle the representations into human understandable features. 2) We then present an automatic interpretation framework to interpreted the open-semantic features learned in SAE by the LMMs themselves. We employ this framework to analyze the LLaVA-NeXT-8B model using the LLaVA-OV-72B model, demonstrating that these features can effectively steer the model's behavior. Our results contribute to a deeper understanding of why LMMs excel in specific tasks, including EQ tests, and illuminate the nature of their mistakes along with potential strategies for their rectification. These findings offer new insights into the internal mechanisms of LMMs and suggest parallels with the cognitive processes of the human brain.

大規模なマルチモーダルモデルは、大規模なマルチモーダルモデル内の特徴を解釈できる可能性があります。

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

要旨

Support