IPBench：大規模言語モデルの知的財産分野における知識のベンチマーキング

要旨

知的財産（IP）は、技術的知識と法的知識を統合する独特の領域であり、本質的に複雑で知識集約的です。大規模言語モデル（LLM）が進化を続ける中、IPタスクの処理において大きな可能性を示しており、IP関連コンテンツの分析、理解、生成をより効率的に行うことが可能になっています。しかし、既存のデータセットやベンチマークは特許に焦点を絞りすぎているか、IP分野の限られた側面しかカバーしておらず、現実世界のシナリオとの整合性が欠けています。このギャップを埋めるため、我々は初の包括的なIPタスク分類と、8つのIPメカニズムと20のタスクをカバーする大規模で多様なバイリンガルベンチマーク「IPBench」を導入しました。このベンチマークは、現実世界の知的財産アプリケーションにおけるLLMの評価を目的としており、理解と生成の両方を含んでいます。汎用モデルからドメイン特化モデルまで16のLLMをベンチマークした結果、最高性能のモデルでも75.8%の精度しか達成できず、改善の余地が大きいことが明らかになりました。特に、オープンソースのIPおよび法律指向モデルは、クローズドソースの汎用モデルに後れを取っています。我々はIPBenchのすべてのデータとコードを公開し、知的財産領域における現実世界の課題をよりよく反映するため、追加のIP関連タスクで継続的に更新していく予定です。

English

Intellectual Property (IP) is a unique domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. As large language models (LLMs) continue to advance, they show great potential for processing IP tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks either focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce the first comprehensive IP task taxonomy and a large, diverse bilingual benchmark, IPBench, covering 8 IP mechanisms and 20 tasks. This benchmark is designed to evaluate LLMs in real-world intellectual property applications, encompassing both understanding and generation. We benchmark 16 LLMs, ranging from general-purpose to domain-specific models, and find that even the best-performing model achieves only 75.8% accuracy, revealing substantial room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. We publicly release all data and code of IPBench and will continue to update it with additional IP-related tasks to better reflect real-world challenges in the intellectual property domain.

IPBench：大規模言語モデルの知的財産分野における知識のベンチマーキング

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

要旨

Support