生成AIアプリケーションにおける責任あるAIの危害を自動測定するためのフレームワーク

要旨

大規模言語モデル（LLM）および関連する製品やサービスにおける責任あるAI（RAI）指標の自動計測のためのフレームワークを提案する。LLMから生じる危害を自動的に計測する本フレームワークは、既存の技術的および社会技術的専門知識を基盤とし、GPT-4のような最先端のLLMの能力を活用する。このフレームワークを用いて、さまざまなLLMがRAI関連の原則にどのように違反するかを調査する複数のケーススタディを実施した。本フレームワークは、将来新たな危害領域の計測を作成するために、ドメイン固有の社会技術的専門知識と併用することが可能である。このフレームワークを実装することで、より高度な危害計測の取り組みを可能にし、LLMの責任ある使用を推進することを目指す。

English

We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services. Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise and leverages the capabilities of state-of-the-art LLMs, such as GPT-4. We use this framework to run through several case studies investigating how different LLMs may violate a range of RAI-related principles. The framework may be employed alongside domain-specific sociotechnical expertise to create measurements for new harm areas in the future. By implementing this framework, we aim to enable more advanced harm measurement efforts and further the responsible use of LLMs.

生成AIアプリケーションにおける責任あるAIの危害を自動測定するためのフレームワーク

A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

要旨

Support