信息合成:面向大语言模型的信息引导基准生成框架
InfoSynth: Information-Guided Benchmark Synthesis for LLMs
January 2, 2026
作者: Ishir Garg, Neel Kolhe, Xuandong Zhao, Dawn Song
cs.AI
摘要
大型语言模型(LLMs)在推理和代码生成方面展现出显著进步,但如何高效创建评估这些能力的新基准仍具挑战。传统基准创建依赖人工劳动,这一过程既昂贵又耗时。此外,现有基准常会污染LLM训练数据,因此需要新颖多样的基准来准确评估其真实能力。本研究提出InfoSynth——一种基于信息论原理自动生成和评估推理基准的创新框架。我们提出基于KL散度和熵的指标,无需依赖昂贵的模型评估即可量化基准的新颖性与多样性。基于该框架,我们开发出端到端流程,通过遗传算法和迭代式代码反馈从种子数据集合成稳健的Python编程题目。我们的方法在97%的情况下能为新问题生成准确的测试用例与解决方案,且合成基准相较于种子数据集持续展现出更高新颖性与多样性。此外,该算法提供了控制生成题目新颖性/多样性与难度的方法。InfoSynth为构建高质量、新颖多样的LLM基准提供了可扩展的自验证流程。项目页面:https://ishirgarg.github.io/infosynth_web/
English
Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation. However, efficiently creating new benchmarks to evaluate these capabilities remains a challenge. Traditional benchmark creation relies on manual human effort, a process that is both expensive and time-consuming. Furthermore, existing benchmarks often contaminate LLM training data, necessitating novel and diverse benchmarks to accurately assess their genuine capabilities. This work introduces InfoSynth, a novel framework for automatically generating and evaluating reasoning benchmarks guided by information-theoretic principles. We propose metrics based on KL-divergence and entropy to quantify benchmark novelty and diversity without relying on costly model evaluations. Building on this framework, we develop an end-to-end pipeline that synthesizes robust Python coding problems from seed datasets using genetic algorithms and iterative code feedback. Our method generates accurate test cases and solutions to new problems 97% of the time, and the synthesized benchmarks consistently exhibit higher novelty and diversity compared to their seed datasets. Moreover, our algorithm provides a method for controlling the novelty/diversity and difficulty of generated problems. InfoSynth offers a scalable, self-verifying pipeline for constructing high-quality, novel and diverse benchmarks for LLMs. Project Page: https://ishirgarg.github.io/infosynth_web/