심볼릭 전문가 혼합 모델: 이질적 추론을 위한 적응형 기술 기반 라우팅

초록

기존에 사전 훈련된 전문가 대형 언어 모델(LLM)을 결합하는 것은 대규모 및 다양한 작업을 확장 가능하게 해결하기 위한 유망한 접근법입니다. 그러나 작업 수준에서 전문가를 선택하는 것은 종종 너무 거친 단위로, 이질적인 작업은 각 인스턴스마다 다른 전문 지식을 요구할 수 있습니다. 사전 훈련된 LLM 전문가들을 적응적으로 인스턴스 수준에서 혼합하기 위해, 우리는 Symbolic-MoE라는 기호 기반, 텍스트 기반, 그리고 그래디언트가 없는 Mixture-of-Experts 프레임워크를 제안합니다. Symbolic-MoE는 수학에서의 대수학이나 생물의학적 추론에서의 분자 생물학과 같은 기술을 강조함으로써 세밀한 선택 방식을 취합니다. 우리는 다양한 추론 작업에 대해 각 전문가 LLM의 강점을 기반으로 가장 관련 있는 전문가 집단을 동적으로 선택하는 기술 기반 채용 전략을 제안합니다. 각 선택된 전문가는 자체 추론을 생성하여 k명의 전문가로부터 k개의 출력을 생성하며, 이는 다양한 추론 출력을 통합할 수 있는 능력을 기반으로 선택된 집계자에 의해 최종 고품질 응답으로 합성됩니다. 우리는 Symbolic-MoE의 인스턴스 수준 전문가 선택이 성능을 크게 향상시키지만, 순진하게 구현할 경우 모델을 지속적으로 로드하고 언로드해야 하는 필요로 인해 높은 계산 오버헤드를 초래할 수 있음을 보여줍니다. 이를 해결하기 위해, 우리는 할당된 전문가를 기반으로 인스턴스를 그룹화하여 각 모델을 한 번만 로드하는 배치 추론 전략을 구현합니다. 이를 통해 1개의 GPU에서 16개의 전문가 모델을 통합할 수 있으며, 이는 4개의 GPU를 사용하는 기존의 다중 에이전트 베이스라인과 비교해 시간 비용이 비슷하거나 더 나은 성능을 보입니다. 다양한 벤치마크(MMLU-Pro, GPQA, AIME, MedMCQA)에 대한 광범위한 평가를 통해, 우리는 Symbolic-MoE가 GPT4o-mini와 같은 강력한 LLM뿐만 아니라 다중 에이전트 접근법을 능가하며, 최고의 다중 에이전트 베이스라인 대비 평균 8.15%의 절대적 개선을 달성함을 입증합니다. 더욱이, Symbolic-MoE는 비용이 많이 드는 다중 라운드 토론의 필요성을 제거하며, 더 적은 계산으로 토론 베이스라인을 능가합니다.

English

Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute average improvement of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

심볼릭 전문가 혼합 모델: 이질적 추론을 위한 적응형 기술 기반 라우팅

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

초록

Support