ParEVO：面向不规则数据的代码合成：基于智能体进化的高性能并行计算

摘要

从串行计算到并行计算的转变对现代高性能应用至关重要，但并发编程的陡峭学习曲线阻碍了这一进程。这一挑战在非规则数据结构（如稀疏图、不平衡树和非均匀网格）中尤为突出——静态调度在此失效，数据依赖关系难以预测。当前的大语言模型在处理这类任务时常常严重失败，生成的代码存在隐蔽的竞态条件、死锁及次优扩展性问题。我们通过ParEVO框架弥合这一鸿沟，该框架专为非规则数据的高性能并行算法合成而设计。我们的贡献包括：（1）Parlay-Instruct语料库：包含13,820个任务的精选数据集，通过"批判-优化"流程合成，明确筛选出能有效利用工作-跨度并行原语且经验证高效的算法；（2）专项优化的DeepSeek、Qwen和Gemini模型：通过微调使概率生成与ParlayLib库的严格语义对齐；（3）进化式编程代理：利用编译器、动态竞态检测器和性能分析器的反馈迭代修复代码，提升"最后一公里"的正确性。在ParEval基准测试中，ParEVO在测试集上实现平均106倍（最高1103倍）加速，在复杂非规则图问题上保持稳健的13.6倍加速，超越现有商业模型。此外，我们的进化方法媲美顶尖人类专家基线，在特定高非规则内核上实现最高4.1倍加速。源代码与数据集详见https://github.com/WildAlg/ParEVO。

English

The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. This challenge is magnified for irregular data structures (such as sparse graphs, unbalanced trees, and non-uniform meshes) where static scheduling fails and data dependencies are unpredictable. Current Large Language Models (LLMs) often fail catastrophically on these tasks, generating code plagued by subtle race conditions, deadlocks, and sub-optimal scaling. We bridge this gap with ParEVO, a framework designed to synthesize high-performance parallel algorithms for irregular data. Our contributions include: (1) The Parlay-Instruct Corpus, a curated dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline that explicitly filters for empirically performant algorithms that effectively utilize Work-Span parallel primitives; (2) specialized DeepSeek, Qwen, and Gemini models fine-tuned to align probabilistic generation with the rigorous semantics of the ParlayLib library; and (3) an Evolutionary Coding Agent (ECA) that improves the "last mile" of correctness by iteratively repairing code using feedback from compilers, dynamic race detectors, and performance profilers. On the ParEval benchmark, ParEVO achieves an average 106x speedup (with a maximum of 1103x) across the suite, and a robust 13.6x speedup specifically on complex irregular graph problems, outperforming state-of-the-art commercial models. Furthermore, our evolutionary approach matches state-of-the-art expert human baselines, achieving up to a 4.1x speedup on specific highly-irregular kernels. Source code and datasets are available at https://github.com/WildAlg/ParEVO.