比較対象：フランス政府のLLMアリーナ - フランス語の人間によるプロンプトと嗜好データを収集するため

要旨

大規模言語モデル（LLM）は、非英語言語において性能の低下、文化的適合性の欠如、安全性の堅牢性の不足が見られることが多い。これは一部、事前学習データと人間の嗜好調整データセットの両方において英語が支配的であることに起因する。RLHF（人間のフィードバックからの強化学習）やDPO（直接嗜好最適化）のような訓練手法では人間の嗜好データが必要であるが、英語以外の多くの言語では、このデータが不足しており、かつ大部分が非公開のままである。この格差を解消するため、我々はcompar:IAを紹介する。これはフランス政府内で開発されたオープンソースのデジタル公共サービスであり、主にフランス語を話す一般市民から大規模な人間の嗜好データを収集するように設計されている。このプラットフォームは、ブラインドペアワイズ比較インターフェースを使用し、多様な言語モデルにわたる制約のない実世界のプロンプトとユーザー判断を収集するとともに、参加の障壁を低く保ち、プライバシーを保護する自動フィルタリングを維持する。2026年2月7日現在、compar:IAは60万以上の自由形式プロンプトと25万の嗜好投票を収集しており、データの約89%がフランス語である。我々は、会話、投票、反応という3つの相補的なデータセットをオープンライセンスで公開し、フランス語モデルリーダーボードやユーザーインタラクションパターンを含む初期分析を提示する。フランス語圏を超えて、compar:IAは国際的なデジタル公共財へと進化しており、多言語モデルの訓練、評価、人間-AIインタラクションの研究のための再利用可能なインフラを提供する。

English

Large Language Models (LLMs) often show reduced performance, cultural alignment, and safety robustness in non-English languages, partly because English dominates both pre-training data and human preference alignment datasets. Training methods like Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) require human preference data, which remains scarce and largely non-public for many languages beyond English. To address this gap, we introduce compar:IA, an open-source digital public service developed inside the French government and designed to collect large-scale human preference data from a predominantly French-speaking general audience. The platform uses a blind pairwise comparison interface to capture unconstrained, real-world prompts and user judgments across a diverse set of language models, while maintaining low participation friction and privacy-preserving automated filtering. As of 2026-02-07, compar:IA has collected over 600,000 free-form prompts and 250,000 preference votes, with approximately 89% of the data in French. We release three complementary datasets -- conversations, votes, and reactions -- under open licenses, and present initial analyses, including a French-language model leaderboard and user interaction patterns. Beyond the French context, compar:IA is evolving toward an international digital public good, offering reusable infrastructure for multilingual model training, evaluation, and the study of human-AI interaction.

比較対象：フランス政府のLLMアリーナ - フランス語の人間によるプロンプトと嗜好データを収集するため

compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data

要旨

Support