データとAIガバナンス：大規模言語モデルにおける公平性、倫理、公正の促進

要旨

本論文では、機械学習モデルの完全なライフサイクルにわたって、初期開発と検証から継続的な本番環境での監視およびガードレールの実装まで、バイアスを体系的に管理、評価、定量化するアプローチを網羅します。大規模言語モデル（LLMs）向けのバイアス評価および評価テストスイート（BEATS）に関する基礎的な研究を基盤として、著者らはLLMsにおける一般的なバイアスと公平性に関連するギャップを共有し、LLMs内のバイアス、倫理、公平性、および事実性に対処するためのデータとAIガバナンスフレームワークについて議論します。本論文で議論されるデータとAIガバナンスアプローチは、実践的で現実世界のアプリケーションに適しており、本番環境への展開前にLLMsを厳密にベンチマークし、継続的なリアルタイム評価を容易にし、LLM生成応答を積極的に管理することを可能にします。AI開発のライフサイクル全体にわたってデータとAIガバナンスを実施することにより、組織はGenAIシステムの安全性と責任を大幅に向上させ、差別のリスクを効果的に軽減し、潜在的な評判やブランド関連の損害から保護することができます。最終的に、本記事を通じて、社会的に責任があり倫理的に整合した生成人工知能を活用したアプリケーションの作成と展開の進展に貢献することを目指します。

English

In this paper, we cover approaches to systematically govern, assess and quantify bias across the complete life cycle of machine learning models, from initial development and validation to ongoing production monitoring and guardrail implementation. Building upon our foundational work on the Bias Evaluation and Assessment Test Suite (BEATS) for Large Language Models, the authors share prevalent bias and fairness related gaps in Large Language Models (LLMs) and discuss data and AI governance framework to address Bias, Ethics, Fairness, and Factuality within LLMs. The data and AI governance approach discussed in this paper is suitable for practical, real-world applications, enabling rigorous benchmarking of LLMs prior to production deployment, facilitating continuous real-time evaluation, and proactively governing LLM generated responses. By implementing the data and AI governance across the life cycle of AI development, organizations can significantly enhance the safety and responsibility of their GenAI systems, effectively mitigating risks of discrimination and protecting against potential reputational or brand-related harm. Ultimately, through this article, we aim to contribute to advancement of the creation and deployment of socially responsible and ethically aligned generative artificial intelligence powered applications.

データとAIガバナンス：大規模言語モデルにおける公平性、倫理、公正の促進

Data and AI governance: Promoting equity, ethics, and fairness in large language models

要旨

Support