生成式人工智能应用中负责任人工智能危害自动测量框架

摘要

我们提出了一个框架，用于自动测量大型语言模型（LLMs）及相关产品和服务的负责任人工智能（RAI）指标。我们的自动测量LLMs造成的伤害的框架建立在现有技术和社会技术专业知识的基础上，并利用了GPT-4等最先进的LLMs的能力。我们使用这一框架来运行几个案例研究，调查不同LLMs如何可能违反一系列与RAI相关的原则。该框架可与领域特定的社会技术专业知识一起使用，以便未来为新的伤害领域创建测量标准。通过实施这一框架，我们旨在促进更先进的伤害测量工作，并进一步推动LLMs的负责任使用。

English

We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services. Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise and leverages the capabilities of state-of-the-art LLMs, such as GPT-4. We use this framework to run through several case studies investigating how different LLMs may violate a range of RAI-related principles. The framework may be employed alongside domain-specific sociotechnical expertise to create measurements for new harm areas in the future. By implementing this framework, we aim to enable more advanced harm measurement efforts and further the responsible use of LLMs.

生成式人工智能应用中负责任人工智能危害自动测量框架

A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

摘要

Support