生成基盤モデルの信頼性について：ガイドライン、評価、展望

要旨

生成基盤モデル（GenFMs）は、変革をもたらすツールとして登場しました。しかし、その広範な採用は、信頼性に関する重要な懸念を引き起こしています。本論文では、これらの課題に対処するための包括的なフレームワークを3つの主要な貢献を通じて提示します。まず、政府や規制機関によるグローバルなAIガバナンスの法律や政策、および業界の実践と標準を体系的にレビューします。この分析に基づき、技術的、倫理的、法的、社会的な視点を統合した多分野の協力を通じて、GenFMsのための一連のガイドライン原則を提案します。次に、テキストから画像、大規模言語、視覚言語モデルなど、複数の次元とモデルタイプにわたる信頼性を評価するために設計された初の動的ベンチマークプラットフォームであるTrustGenを紹介します。TrustGenは、メタデータのキュレーション、テストケースの生成、文脈の変動といったモジュールコンポーネントを活用し、静的評価手法の限界を克服する適応的かつ反復的な評価を可能にします。TrustGenを使用して、信頼性における重要な進展を明らかにするとともに、持続的な課題を特定します。最後に、信頼性のあるGenFMsの課題と将来の方向性について詳細に議論し、信頼性の複雑で進化する性質を明らかにし、有用性と信頼性の間の微妙なトレードオフや、さまざまな下流アプリケーションに対する考慮事項を強調し、持続的な課題を特定し、将来の研究のための戦略的なロードマップを提供します。この研究は、GenAIの信頼性を向上させるための包括的なフレームワークを確立し、GenFMsを重要なアプリケーションに安全かつ責任を持って統合する道を開きます。コミュニティの進歩を促進するために、動的評価のためのツールキットを公開します。

English

Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.

生成基盤モデルの信頼性について：ガイドライン、評価、展望

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

要旨

Support