ToMAP: 心の理論を用いた対戦者認識型LLM説得者のトレーニング

要旨

大規模言語モデル（LLM）は説得において有望な可能性を示していますが、LLM説得エージェントの訓練に関する既存の研究はまだ初期段階にあります。特に、人間は相手の思考や意見を積極的かつ動的にモデル化する能力に長けていますが、現在のLLMはそのような心の理論（Theory of Mind, ToM）推論に苦戦しており、多様性や相手への認識が限られています。この制約を解決するため、我々は「心の理論拡張説得エージェント（Theory of Mind Augmented Persuader, ToMAP）」を提案します。これは、説得エージェントの相手の心理状態に対する認識と分析を強化する2つの心の理論モジュールを組み込むことで、より柔軟な説得エージェントを構築する新しいアプローチです。具体的には、まず説得エージェントに対象となる中心的な主張に対する反論を考慮するよう促し、その後、テキストエンコーダと訓練済みのMLP分類器を使用して、相手がこれらの反論に対して現在どのような立場を取っているかを予測します。我々が慎重に設計した強化学習スキーマにより、説得エージェントは相手関連の情報を分析し、それを活用してより効果的な議論を生成する方法を学びます。実験結果は、ToMAP説得エージェントが3Bパラメータしか含まないにもかかわらず、GPT-4oのようなはるかに大規模なベースラインを上回り、複数の被説得者モデルと多様なコーパスにおいて39.4%の相対的な向上を示しています。特に、ToMAPは複雑な推論チェーンを示し、訓練中の繰り返しが減少することで、より多様で効果的な議論を生成します。ToMAPの相手認識機能は、長い会話にも適しており、より論理的で相手を意識した戦略を採用することを可能にします。これらの結果は、我々の手法の有効性を裏付けるとともに、より説得力のある言語エージェントを開発するための潜在的可能性を強調しています。コードは以下で公開されています：https://github.com/ulab-uiuc/ToMAP。

English

Large language models (LLMs) have shown promising potential in persuasion, but existing works on training LLM persuaders are still preliminary. Notably, while humans are skilled in modeling their opponent's thoughts and opinions proactively and dynamically, current LLMs struggle with such Theory of Mind (ToM) reasoning, resulting in limited diversity and opponent awareness. To address this limitation, we introduce Theory of Mind Augmented Persuader (ToMAP), a novel approach for building more flexible persuader agents by incorporating two theory of mind modules that enhance the persuader's awareness and analysis of the opponent's mental state. Specifically, we begin by prompting the persuader to consider possible objections to the target central claim, and then use a text encoder paired with a trained MLP classifier to predict the opponent's current stance on these counterclaims. Our carefully designed reinforcement learning schema enables the persuader learns how to analyze opponent-related information and utilize it to generate more effective arguments. Experiments show that the ToMAP persuader, while containing only 3B parameters, outperforms much larger baselines, like GPT-4o, with a relative gain of 39.4% across multiple persuadee models and diverse corpora. Notably, ToMAP exhibits complex reasoning chains and reduced repetition during training, which leads to more diverse and effective arguments. The opponent-aware feature of ToMAP also makes it suitable for long conversations and enables it to employ more logical and opponent-aware strategies. These results underscore our method's effectiveness and highlight its potential for developing more persuasive language agents. Code is available at: https://github.com/ulab-uiuc/ToMAP.

ToMAP: 心の理論を用いた対戦者認識型LLM説得者のトレーニング

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

要旨

Support