LLM을 위한 효율적 탐색

초록

본 연구에서는 대규모 언어 모델을 개선하기 위해 인간 피드백을 수집하는 과정에서 효율적 탐색이 상당한 이점을 제공한다는 증거를 제시한다. 실험에서는 에이전트가 피드백을 받아들이며 보상 모델을 적합화하는 동시에 순차적으로 질의를 생성한다. 가장 우수한 성능을 보인 에이전트는 인식론적 신경망으로 표현된 불확실성을 기반으로 더블 톰슨 샘플링을 사용하여 질의를 생성한다. 연구 결과는 효율적 탐색이 훨씬 적은 수의 질의로도 높은 수준의 성능을 가능하게 함을 보여준다. 또한, 불확실성 추정과 탐색 전략 선택 모두 중요한 역할을 하는 것으로 나타났다.

English

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

LLM을 위한 효율적 탐색

Efficient Exploration for LLMs

초록

Support