政治学における大規模言語モデルのベンチマーキング：国連の視点から

要旨

大規模言語モデル（LLMs）は自然言語処理において著しい進展を遂げているが、高リスクの政治的意思決定におけるその可能性は未だ十分に探求されていない。本論文は、特にリスクが高く、政治的決定が広範な影響を及ぼす可能性がある国連（UN）の意思決定プロセスへのLLMsの応用に焦点を当て、このギャップを埋めることを目的とする。1994年から2024年までの国連安全保障理事会（UNSC）の公開記録、すなわち決議案草案、投票記録、外交演説を含む新たなデータセットを紹介する。このデータセットを用いて、我々は国連ベンチマーク（UNBench）を提案する。これは、LLMsを4つの相互に関連する政治学的タスク——共同提案者判定、代表投票シミュレーション、草案採択予測、代表声明生成——にわたって評価する初の包括的ベンチマークである。これらのタスクは、国連の意思決定プロセスの3つの段階——草案作成、投票、議論——にまたがり、LLMsが政治的なダイナミクスを理解しシミュレートする能力を評価することを目指す。我々の実験分析は、この領域におけるLLMsの応用の可能性と課題を示し、政治学における強みと限界に関する洞察を提供する。本研究は、AIと政治学の交差点をさらに広げ、グローバルガバナンスにおける研究と実践的な応用の新たな道を開くものである。UNBenchリポジトリは以下からアクセス可能である：https://github.com/yueqingliang1/UNBench。

English

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process--drafting, voting, and discussing--and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. This work contributes to the growing intersection of AI and political science, opening new avenues for research and practical applications in global governance. The UNBench Repository can be accessed at: https://github.com/yueqingliang1/UNBench.

政治学における大規模言語モデルのベンチマーキング：国連の視点から

Benchmarking LLMs for Political Science: A United Nations Perspective

要旨

Support