在大语言模型时代重新思考可解释性

摘要

在过去的十年中，可解释机器学习作为一个领域引起了极大关注，这主要受到日益增长的大型数据集和深度神经网络的崛起的推动。与此同时，大型语言模型（LLMs）展示了在各种任务中的显著能力，为重新思考可解释机器学习中的机遇提供了机会。值得注意的是，以自然语言解释的能力使LLMs能够扩大可以呈现给人类的规模和复杂性的模式。然而，这些新能力带来了新的挑战，如虚构的解释和巨大的计算成本。在这篇立场论文中，我们首先回顾了评估新兴LLM解释领域的现有方法（既解释LLMs又使用LLMs进行解释）。我们认为，尽管存在局限性，LLMs有机会通过更雄心勃勃的范围重新定义可解释性，涵盖许多应用领域，包括审计LLMs本身。我们强调LLM解释的两个新兴研究重点：使用LLMs直接分析新数据集和生成交互式解释。

English

Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

在大语言模型时代重新思考可解释性

Rethinking Interpretability in the Era of Large Language Models

摘要

Support