PatRe：特許審査における全段階の拒絶理由通知および反論生成ベンチマーク

要旨

特許審査は、技術的専門知識と法的推論の両方を要する複雑な多段階プロセスであり、増加する出願件数によってその難易度が高まっている。従来のベンチマークは、特許審査を識別的分類または静的な情報抽出として捉えることが主流であり、学術出版におけるピアレビューと反論プロセスと同様の、本来備わっている対話的かつ反復的な性質を十分に反映できていない。本論文では、官庁通知の生成と出願人による反論を含む、特許審査の完全なライフサイクルをモデル化した初のベンチマークであるPatReを提案する。PatReは480件の実事例で構成され、完全情報設定と検索シミュレーション評価設定の両方をサポートする。本ベンチマークは、特許審査を正当化と応答の動的なマルチターンプロセスとして再定義する。様々な大規模言語モデルを用いた広範な実験により、プロプライエタリモデルとオープンソースモデルの性能差、審査官側の分析と出願人側の反論というタスク間の非対称性など、モデル性能に関する重要な知見が得られた。これらの発見は、複雑な実世界の法的推論および技術的新規性判断をモデル化する際の、大規模言語モデルの可能性と現時点での限界の両方を浮き彫りにしている。今後の特許審査モデリング研究の促進のため、コードとデータセットを公開する。

English

Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising application volumes. Prior benchmarks predominantly view patent examination as discriminative classification or static extraction, failing to capture its inherently interactive and iterative nature, similar to the peer review and rebuttal process in academic publishing. In this paper, we introduce PatRe, the first benchmark that models the full patent examination lifecycle, including Office Action generation and applicant rebuttal. PatRe comprises 480 real-world cases and supports both oracle and retrieval-simulated evaluation settings. Our benchmark reframes patent examination as a dynamic, multi-turn process of justification and response. Extensive experiments across various LLMs reveal critical insights into model performance, including differences between proprietary and open-source models, as well as task asymmetries between examiner analysis and applicant-side rebuttal. These findings highlight both the potential and current limitations of LLMs in modeling complex, real-world legal reasoning and technical novelty judgment in patent examination. We release our code and dataset to facilitate future research on patent examination modeling.

PatRe：特許審査における全段階の拒絶理由通知および反論生成ベンチマーク

PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

要旨

Support