検索拡張生成の最適化：性能と効率に対するハイパーパラメータ影響の分析

要旨

大規模言語モデルは高いタスク性能を達成する一方で、しばしば虚構を生成したり、古い知識に依存したりする。検索拡張生成（RAG）は、生成と外部検索を組み合わせることでこれらのギャップを埋める。本研究では、RAGシステムにおける速度と品質にハイパーパラメータがどのように影響するかを分析し、ChromaとFaissのベクトルストア、チャンキングポリシー、クロスエンコーダによる再ランキング、温度設定を網羅し、6つの指標（忠実性、回答の正確性、回答の関連性、コンテキストの精度、コンテキストの再現率、回答の類似性）を評価した。Chromaはクエリ処理が13%速い一方で、Faissはより高い検索精度を示し、速度と精度のトレードオフが明らかになった。小さなウィンドウと最小限のオーバーラップを用いた単純な固定長チャンキングは、セマンティックセグメンテーションを上回りながらも最も高速な選択肢であった。再ランキングは検索品質をわずかに向上させるが、実行時間を約5倍に増加させるため、その有用性はレイテンシ制約に依存する。これらの結果は、透明で最新の応答を実現するためにRAGシステムをチューニングする際に、計算コストと精度のバランスを取るための実践的な指針を提供する。最後に、修正型RAGワークフローを用いて最上位の構成を再評価し、モデルが追加の証拠を反復的に要求できる場合でもその利点が持続することを示した。ほぼ完璧なコンテキスト精度（99%）を達成し、適切なハイパーパラメータの組み合わせによりRAGシステムが極めて高い検索精度を実現できることを実証した。これは、検索品質が下流タスクの性能に直接影響する医療における臨床意思決定支援などのアプリケーションにおいて重要な意味を持つ。

English

Large language models achieve high task performance yet often hallucinate or rely on outdated knowledge. Retrieval-augmented generation (RAG) addresses these gaps by coupling generation with external search. We analyse how hyperparameters influence speed and quality in RAG systems, covering Chroma and Faiss vector stores, chunking policies, cross-encoder re-ranking, and temperature, and we evaluate six metrics: faithfulness, answer correctness, answer relevancy, context precision, context recall, and answer similarity. Chroma processes queries 13% faster, whereas Faiss yields higher retrieval precision, revealing a clear speed-accuracy trade-off. Naive fixed-length chunking with small windows and minimal overlap outperforms semantic segmentation while remaining the quickest option. Re-ranking provides modest gains in retrieval quality yet increases runtime by roughly a factor of 5, so its usefulness depends on latency constraints. These results help practitioners balance computational cost and accuracy when tuning RAG systems for transparent, up-to-date responses. Finally, we re-evaluate the top configurations with a corrective RAG workflow and show that their advantages persist when the model can iteratively request additional evidence. We obtain a near-perfect context precision (99%), which demonstrates that RAG systems can achieve extremely high retrieval accuracy with the right combination of hyperparameters, with significant implications for applications where retrieval quality directly impacts downstream task performance, such as clinical decision support in healthcare.

検索拡張生成の最適化：性能と効率に対するハイパーパラメータ影響の分析

Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency

要旨

Support