优化检索增强生成：超参数对性能与效率影响的深入分析

摘要

大型語言模型在任務執行上表現出色，卻常伴隨著虛構或依賴過時知識的問題。檢索增強生成（RAG）技術通過將生成過程與外部搜索相結合，有效彌補了這些不足。本研究深入探討了超參數如何影響RAG系統的速度與質量，涵蓋了Chroma與Faiss向量存儲、分塊策略、交叉編碼器重排序及溫度參數，並評估了六項指標：忠實度、答案正確性、答案相關性、上下文精確度、上下文召回率及答案相似性。結果顯示，Chroma處理查詢速度提升13%，而Faiss則展現出更高的檢索精確度，揭示了速度與準確性之間的明顯權衡。採用小窗口且最小重疊的固定長度分塊策略，不僅優於語義分割，還保持了最快的處理速度。重排序雖能小幅提升檢索質量，但會使運行時間增加約五倍，因此其實用性取決於延遲限制。這些發現有助於實踐者在調校RAG系統時，在計算成本與準確性之間找到平衡，以實現透明且最新的響應。最後，我們通過糾正式RAG工作流重新評估了頂尖配置，並證明當模型能迭代請求額外證據時，其優勢依然存在。我們獲得了近乎完美的上下文精確度（99%），這表明RAG系統在恰當的超參數組合下，能夠實現極高的檢索準確性，這對於檢索質量直接影響下游任務表現的應用領域，如醫療保健中的臨床決策支持，具有重大意義。

English

Large language models achieve high task performance yet often hallucinate or rely on outdated knowledge. Retrieval-augmented generation (RAG) addresses these gaps by coupling generation with external search. We analyse how hyperparameters influence speed and quality in RAG systems, covering Chroma and Faiss vector stores, chunking policies, cross-encoder re-ranking, and temperature, and we evaluate six metrics: faithfulness, answer correctness, answer relevancy, context precision, context recall, and answer similarity. Chroma processes queries 13% faster, whereas Faiss yields higher retrieval precision, revealing a clear speed-accuracy trade-off. Naive fixed-length chunking with small windows and minimal overlap outperforms semantic segmentation while remaining the quickest option. Re-ranking provides modest gains in retrieval quality yet increases runtime by roughly a factor of 5, so its usefulness depends on latency constraints. These results help practitioners balance computational cost and accuracy when tuning RAG systems for transparent, up-to-date responses. Finally, we re-evaluate the top configurations with a corrective RAG workflow and show that their advantages persist when the model can iteratively request additional evidence. We obtain a near-perfect context precision (99%), which demonstrates that RAG systems can achieve extremely high retrieval accuracy with the right combination of hyperparameters, with significant implications for applications where retrieval quality directly impacts downstream task performance, such as clinical decision support in healthcare.