전문가로의 라우팅: 효율적인 보상 기반 대규모 언어 모델 앙상블

초록

대규모 언어 모델(LLM)의 상호보완적 잠재력은 기성 LLM들이 다양한 도메인과 작업에 걸쳐 이질적인 전문성을 가지고 있기 때문에, 여러 LLM을 앙상블하면 일관되게 더 나은 성능을 달성할 수 있다는 가정에 기반합니다. 기존의 LLM 앙상블 방법은 주로 출력에 대한 보상 모델 순위 매기기에 초점을 맞추어 상당한 계산 오버헤드를 초래합니다. 이 문제를 해결하기 위해, 우리는 LLM의 상호보완적 잠재력을 재검토하고, 기성 보상 모델을 사용하여 잠재적인 전문성을 발굴함으로써 이를 더욱 구체화합니다. 우리는 Zooter를 제안하는데, 이는 학습 쿼리에 대한 보상을 증류하여 각 쿼리를 해당 전문성을 가진 LLM에 정확히 분배할 수 있는 라우팅 함수를 훈련시키는 보안-가이드 라우팅 방법입니다. 또한, 보상을 은색 감독으로 사용할 때 발생하는 불확실성으로 인한 노이즈를 완화하기 위해 태그 기반 레이블 강화를 통합했습니다. Zooter는 추론 과정에서 계산 효율성을 보여주는데, 이는 보상 모델 순위 매기기 방법과 비교하여 라우팅 함수의 미미한 계산 오버헤드만을 도입하기 때문입니다. 우리는 Zooter를 다양한 도메인과 작업에 걸친 26개의 하위 집합으로 구성된 포괄적인 벤치마크 컬렉션에서 평가했습니다. Zooter는 평균적으로 최고의 단일 모델을 능가했으며, 44%의 작업에서 1위를 차지하여 여러 보상 모델 순위 매기기 방법을 능가하는 성과를 보였습니다.

English

The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods.

전문가로의 라우팅: 효율적인 보상 기반 대규모 언어 모델 앙상블

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

초록

Support