データアドバイザー：大規模言語モデルの安全整合のためのダイナミックデータキュレーション

要旨

大規模言語モデル（LLM）の整合性において、データは重要な要素です。最近の研究では、LLMを使用して効率的なデータ収集を探る試みが行われています。しかし、LLMによって生成されるデータはしばしば質の問題に直面し、表現されていない側面や質の低いデータポイントが存在します。これらの問題に対処するために、我々は「データアドバイザー」を提案します。これは、望ましいデータセットの特性を考慮したデータ生成のための強化されたLLMベースの手法です。予め定義された原則のセットを元に、データアドバイザーは生成されたデータの状況を監視し、現在のデータセットの弱点を特定し、次のデータ生成のイテレーションに応じてアドバイスを提供します。データアドバイザーは既存のデータ生成方法に容易に統合でき、データの品質とカバレッジを向上させることができます。三つの代表的なLLM（Mistral、Llama2、Falcon）の安全整合性に関する実験は、データアドバイザーのモデルの安全性を向上させる効果を示し、様々な細かい安全性の問題に対してモデルの有用性を犠牲にすることなく安全性を高めることができることを示しています。

English

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects and low-quality datapoints. To address these problems, we propose Data Advisor, an enhanced LLM-based method for generating data that takes into account the characteristics of the desired dataset. Starting from a set of pre-defined principles in hand, Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation accordingly. Data Advisor can be easily integrated into existing data generation methods to enhance data quality and coverage. Experiments on safety alignment of three representative LLMs (i.e., Mistral, Llama2, and Falcon) demonstrate the effectiveness of Data Advisor in enhancing model safety against various fine-grained safety issues without sacrificing model utility.