DEI: 品質多様性探索のための進化的推論における多様性

要旨

我々はDEI（Diversity in Evolutionary Inference：進化推論における多様性）を提案する。これは、異種大規模言語モデル（LLM）を突然変異演算子として割り当て、非ブロッキング集合操作で通信するピアノード間で分散型Quality-Diversity（QD）探索を実現するフレームワークである。単一モデルの帰納バイアスを全ワーカーに複製する同種並列探索とは異なり、DEIは各LLMが持つ独自の創造的先行知識を、行動の新規性を補完する源泉として扱う。Digital Red Queenフレームワークを拡張したDEIでは、各ラウンド終了時にノード間で局所的最適解を共有し、次ラウンドの個体群の種とする。これにより、モデル間の敵対的压力が生まれ、単一モデル内の自己対戦を超えたロバスト性が向上する。シミュレートされた機械上でRedcode戦士プログラムが競い合う競技プログラミングベンチマークであるCore Warドメインにおいて評価を行った。4ノードの異種アンサンブル（GPT-5.4-mini、Claude Sonnet 4.6、GPT-5.2、Claude Haiku 4.5）は、同一の総LLM呼び出し予算の下で、単一ノードベースラインと比較して、マージアーカイブQDスコアで124%向上（45.90対20.46）、カバレッジ（セル占有率）で28%向上（80.6%対63.0%）を達成した。また、異種アンサンブルは、同等の予算の同種アンサンブルに対しても、QDスコア、カバレッジ、および4つの全モデルファミリーにわたる未評価解の汎化性において優れた結果を示した。これらの結果は、分散型LLMベースQD探索における性能向上の主因が単なる並列性ではなく、モデルの多様性にあることを示す初の実証的証拠である。

English

We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework that assigns heterogeneous large language models (LLMs) as mutation operators across peer nodes communicating with non-blocking collective operations. Unlike homogeneous parallel search, which replicates a single model's inductive biases across all workers, DEI treats each LLM's distinct creative prior as a complementary source of behavioral novelty. Extending the Digital Red Queen framework with DEI, nodes share local optimal solutions at the end of each round to seed the next round's population. This creates cross-model adversarial pressure that drives robustness beyond intra-model self-play. Evaluated on the Core War domain, a competitive programming benchmark in which Redcode warrior programs battle inside a simulated machine, a four-node heterogeneous ensemble (GPT-5.4-mini, Claude Sonnet 4.6, GPT-5.2, and Claude Haiku 4.5) achieves 124 percent higher merged-archive QD-Score (45.90 vs. 20.46) and 28 percent higher coverage (80.6 percent vs. 63.0 percent of cells) than a single-node baseline at equal total LLM-call budget. The heterogeneous ensemble also outperforms an equally-budgeted homogeneous ensemble on QD-Score, coverage, and held-out solution generality across all four model families. These results provide the first empirical evidence that model diversity, not merely parallelism, is the key driver of gain in distributed LLM-based QD search.