DEI: 품질-다양성 탐색을 위한 진화적 추론의 다양성

초록

본 논문에서는 DEI(진화적 추론에서의 다양성)를 제안한다. 이는 비동기 집합 연산으로 통신하는 피어 노드들에 걸쳐 이질적인 대규모 언어 모델(LLM)을 돌연변이 연산자로 할당하는 분산 품질-다양성(QD) 탐색 프레임워크이다. 단일 모델의 귀납적 편향을 모든 워커에 복제하는 동질적 병렬 탐색과 달리, DEI는 각 LLM의 고유한 창의적 사전(prior)을 행동적 참신성의 상호 보완적 원천으로 취급한다. DEI를 디지털 레드 퀸 프레임워크에 확장하여, 각 노드는 각 라운드 종료 시 국소 최적 해를 공유하고 이를 다음 라운드 집단의 시드로 사용한다. 이는 교차 모델 적대적 압력을 생성하여 모델 내 자기 대결(intra-model self-play)을 넘어서는 강건성을 이끌어낸다. 시뮬레이션된 기계 내에서 Redcode 워리어 프로그램이 대결하는 경쟁 프로그래밍 벤치마크인 Core War 도메인에서 평가한 결과, 4노드 이질적 앙상블(GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, Claude Haiku 4.5)은 동일한 총 LLM 호출 예산에서 단일 노드 기준선 대비 병합 아카이브 QD 점수(QD-Score)가 124% 더 높았으며(45.90 대 20.46), 커버리지는 28% 더 높았다(셀의 80.6% 대 63.0%). 이질적 앙상블은 동일 예산의 동질적 앙상블보다 QD 점수, 커버리지, 그리고 네 가지 모든 모델 계열에 걸친 보류 해법의 일반성에서도 우수한 성능을 보였다. 이러한 결과는 분산 LLM 기반 QD 탐색에서 병렬성만이 아닌 모델 다양성이 성능 향상의 핵심 동인임을 보여주는 최초의 실증적 증거를 제공한다.

English

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.