대규모 언어 모델 시대의 해석 가능성 재고

초록

해석 가능한 기계 학습은 지난 10년 동안 급증한 관심 분야로, 이는 점점 더 방대해지는 데이터셋과 딥 뉴럴 네트워크의 부상에 의해 촉발되었습니다. 동시에, 대형 언어 모델(LLM)은 다양한 작업에서 놀라운 능력을 보여주며, 해석 가능한 기계 학습에서의 기회를 재고할 수 있는 가능성을 제공했습니다. 특히, 자연어로 설명할 수 있는 능력은 LLM이 인간에게 제공될 수 있는 패턴의 규모와 복잡성을 확장할 수 있게 합니다. 그러나 이러한 새로운 능력은 환각적인 설명과 엄청난 계산 비용과 같은 새로운 도전 과제를 제기합니다. 이 포지션 페이퍼에서 우리는 먼저 LLM 해석(LLM을 해석하는 것과 LLM을 설명에 사용하는 것 모두)이라는 새롭게 부상하는 분야를 평가하기 위한 기존 방법들을 검토합니다. 우리는 LLM이 한계점에도 불구하고, LLM 자체를 감사하는 것을 포함한 다양한 애플리케이션에서 더 야심찬 범위로 해석 가능성을 재정의할 기회를 가지고 있다고 주장합니다. 우리는 LLM 해석을 위한 두 가지 부상하는 연구 우선순위를 강조합니다: LLM을 사용하여 새로운 데이터셋을 직접 분석하는 것과 상호작용적 설명을 생성하는 것입니다.

English

Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations.

대규모 언어 모델 시대의 해석 가능성 재고

Rethinking Interpretability in the Era of Large Language Models

초록

Support