검색 강화 생성 최적화: 성능 및 효율성에 대한 하이퍼파라미터 영향 분석

초록

대규모 언어 모델은 높은 작업 성능을 달성하지만 종종 환각을 일으키거나 오래된 지식에 의존하는 경우가 많다. 검색 증강 생성(Retrieval-Augmented Generation, RAG)은 이러한 격차를 해소하기 위해 생성 과정에 외부 검색을 결합한다. 본 연구에서는 RAG 시스템에서 하이퍼파라미터가 속도와 품질에 미치는 영향을 분석하며, Chroma와 Faiss 벡터 저장소, 청킹 정책, 크로스-인코더 재순위화, 그리고 온도를 다룬다. 또한, 신뢰성, 답변 정확성, 답변 관련성, 컨텍스트 정밀도, 컨텍스트 재현율, 답변 유사성 등 여섯 가지 메트릭을 평가한다. Chroma는 쿼리를 13% 더 빠르게 처리하는 반면, Faiss는 더 높은 검색 정밀도를 제공하여 명확한 속도-정확도 트레이드오프를 보여준다. 작은 윈도우와 최소 중첩을 사용한 단순 고정 길이 청킹은 의미론적 분할을 능가하면서도 가장 빠른 옵션으로 남아 있다. 재순위화는 검색 품질을 약간 향상시키지만 런타임을 약 5배 증가시키므로, 그 유용성은 지연 시간 제약에 따라 달라진다. 이러한 결과는 RAG 시스템을 튜닝하여 투명하고 최신의 응답을 얻기 위해 계산 비용과 정확성 사이의 균형을 맞추는 데 도움을 준다. 마지막으로, 수정적 RAG 워크플로우를 통해 최상의 구성을 재평가하고, 모델이 반복적으로 추가 증거를 요청할 수 있을 때 그 장점이 지속됨을 보여준다. 거의 완벽한 컨텍스트 정밀도(99%)를 달성하여, RAG 시스템이 적절한 하이퍼파라미터 조합을 통해 극도로 높은 검색 정확성을 달성할 수 있음을 입증한다. 이는 검색 품질이 하위 작업 성능에 직접적인 영향을 미치는 응용 분야, 예를 들어 의료 분야의 임상 의사 결정 지원 등에 중요한 시사점을 제공한다.

English

Large language models achieve high task performance yet often hallucinate or rely on outdated knowledge. Retrieval-augmented generation (RAG) addresses these gaps by coupling generation with external search. We analyse how hyperparameters influence speed and quality in RAG systems, covering Chroma and Faiss vector stores, chunking policies, cross-encoder re-ranking, and temperature, and we evaluate six metrics: faithfulness, answer correctness, answer relevancy, context precision, context recall, and answer similarity. Chroma processes queries 13% faster, whereas Faiss yields higher retrieval precision, revealing a clear speed-accuracy trade-off. Naive fixed-length chunking with small windows and minimal overlap outperforms semantic segmentation while remaining the quickest option. Re-ranking provides modest gains in retrieval quality yet increases runtime by roughly a factor of 5, so its usefulness depends on latency constraints. These results help practitioners balance computational cost and accuracy when tuning RAG systems for transparent, up-to-date responses. Finally, we re-evaluate the top configurations with a corrective RAG workflow and show that their advantages persist when the model can iteratively request additional evidence. We obtain a near-perfect context precision (99%), which demonstrates that RAG systems can achieve extremely high retrieval accuracy with the right combination of hyperparameters, with significant implications for applications where retrieval quality directly impacts downstream task performance, such as clinical decision support in healthcare.

검색 강화 생성 최적화: 성능 및 효율성에 대한 하이퍼파라미터 영향 분석

Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency

초록

Support