使用 Captum 來解釋生成式語言模型
Using Captum to Explain Generative Language Models
December 9, 2023
作者: Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan
cs.AI
摘要
Captum是PyTorch中用於模型可解釋性的全面庫,提供了一系列從可解釋性文獻中的方法,以增強用戶對PyTorch模型的理解。在本文中,我們介紹了Captum中專門設計用於分析生成語言模型行為的新功能。我們概述了可用功能以及其潛在應用於理解生成語言模型中學習關聯的示例應用。
English
Captum is a comprehensive library for model explainability in PyTorch,
offering a range of methods from the interpretability literature to enhance
users' understanding of PyTorch models. In this paper, we introduce new
features in Captum that are specifically designed to analyze the behavior of
generative language models. We provide an overview of the available
functionalities and example applications of their potential for understanding
learned associations within generative language models.