利用Captum解释生成式语言模型
Using Captum to Explain Generative Language Models
December 9, 2023
作者: Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan
cs.AI
摘要
Captum是PyTorch中用于模型可解释性的综合库,提供了一系列方法,这些方法源自可解释性文献,旨在增强用户对PyTorch模型的理解。在本文中,我们介绍了Captum中的新功能,专门设计用于分析生成式语言模型的行为。我们概述了可用功能,并提供了示例应用,展示了这些功能对于理解生成式语言模型中学习到的关联的潜力。
English
Captum is a comprehensive library for model explainability in PyTorch,
offering a range of methods from the interpretability literature to enhance
users' understanding of PyTorch models. In this paper, we introduce new
features in Captum that are specifically designed to analyze the behavior of
generative language models. We provide an overview of the available
functionalities and example applications of their potential for understanding
learned associations within generative language models.