在大语言模型时代重新思考可解释性
Rethinking Interpretability in the Era of Large Language Models
January 30, 2024
作者: Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
cs.AI
摘要
在过去的十年中,可解释机器学习作为一个领域引起了极大关注,这主要受到日益增长的大型数据集和深度神经网络的崛起的推动。与此同时,大型语言模型(LLMs)展示了在各种任务中的显著能力,为重新思考可解释机器学习中的机遇提供了机会。值得注意的是,以自然语言解释的能力使LLMs能够扩大可以呈现给人类的规模和复杂性的模式。然而,这些新能力带来了新的挑战,如虚构的解释和巨大的计算成本。
在这篇立场论文中,我们首先回顾了评估新兴LLM解释领域的现有方法(既解释LLMs又使用LLMs进行解释)。我们认为,尽管存在局限性,LLMs有机会通过更雄心勃勃的范围重新定义可解释性,涵盖许多应用领域,包括审计LLMs本身。我们强调LLM解释的两个新兴研究重点:使用LLMs直接分析新数据集和生成交互式解释。
English
Interpretable machine learning has exploded as an area of interest over the
last decade, sparked by the rise of increasingly large datasets and deep neural
networks. Simultaneously, large language models (LLMs) have demonstrated
remarkable capabilities across a wide array of tasks, offering a chance to
rethink opportunities in interpretable machine learning. Notably, the
capability to explain in natural language allows LLMs to expand the scale and
complexity of patterns that can be given to a human. However, these new
capabilities raise new challenges, such as hallucinated explanations and
immense computational costs.
In this position paper, we start by reviewing existing methods to evaluate
the emerging field of LLM interpretation (both interpreting LLMs and using LLMs
for explanation). We contend that, despite their limitations, LLMs hold the
opportunity to redefine interpretability with a more ambitious scope across
many applications, including in auditing LLMs themselves. We highlight two
emerging research priorities for LLM interpretation: using LLMs to directly
analyze new datasets and to generate interactive explanations.