既存の語彙ではAIを理解することはできない

要旨

本ポジションペーパーでは、AIを理解するためには、既存の人間の語彙に依存することはできないと主張する。代わりに、私たちは新語（ネオロジズム）の開発に努めるべきである。新語とは、機械に教えたい人間の概念や、私たちが学ぶ必要のある機械の概念を正確に表す新しい言葉である。私たちは、人間と機械が異なる概念を持っているという前提から出発する。これは、解釈可能性をコミュニケーションの問題として捉えることができることを意味する。つまり、人間は機械の概念を参照し制御できる必要があり、人間の概念を機械に伝える必要がある。新語の開発を通じて人間と機械の共有言語を作り出すことで、このコミュニケーション問題を解決できると私たちは考えている。成功した新語は、有用な抽象化を達成する。つまり、詳細すぎず、多くの文脈で再利用可能であり、かつ高レベルすぎず、正確な情報を伝えるものである。概念実証として、「長さ新語」がLLMの応答長を制御することを可能にし、「多様性新語」がより多様な応答のサンプリングを可能にすることを示す。全体として、私たちは、既存の語彙ではAIを理解することはできず、新語を通じて語彙を拡張することが、機械をより良く制御し理解する機会を創出すると主張する。

English

This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.

既存の語彙ではAIを理解することはできない

We Can't Understand AI Using our Existing Vocabulary

要旨

Support