推論モデルにおける暗黙的バイアス様パターン

要旨

暗黙のバイアスとは、認識、判断、行動を形成する自動的または自発的な心理プロセスを指す。これまでの大規模言語モデル（LLM）における「暗黙のバイアス」に関する研究は、主にモデルの出力に焦点を当てることで、人間における研究とは異なるアプローチを取ることが多かった。モデルの処理を検証するため、我々は推論モデル暗黙連想テスト（RM-IAT）と呼ばれる手法を提案する。これは、複雑なタスクを解決するために段階的な推論を用いるLLMにおいて、暗黙のバイアスに類似したパターンを研究するためのものである。この手法を用いて、推論モデルが連想非整合情報を処理する際には、連想整合情報と比べてより多くのトークンを必要とすることが明らかになった。これらの発見は、AIシステムが人間の暗黙のバイアスに類似した情報処理パターンを持つことを示唆している。我々は、これらの暗黙のバイアスに類似したパターンが実世界のアプリケーションに展開される際の影響について考察する。

English

Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has often approached the phenomenon differently than how it is studied in humans by focusing primarily on model outputs rather than on model processing. To examine model processing, we present a method called the Reasoning Model Implicit Association Test (RM-IAT) for studying implicit bias-like patterns in reasoning models: LLMs that employ step-by-step reasoning to solve complex tasks. Using this method, we find that reasoning models require more tokens when processing association-incompatible information compared to association-compatible information. These findings suggest AI systems harbor patterns in processing information that are analogous to human implicit bias. We consider the implications of these implicit bias-like patterns for their deployment in real-world applications.

推論モデルにおける暗黙的バイアス様パターン

Implicit Bias-Like Patterns in Reasoning Models

要旨

Support