ChatPaper.aiChatPaper

ReSyn:一个通用化递归正则表达式合成框架

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

June 13, 2026
作者: Seongmin Kim, Hyunjoon Cheon, Su-Hyeon Kim, Yo-Sub Han, Sang-Ki Ko
cs.AI

摘要

现有的编程示例(PBE)系统通常依赖于简化的基准测试,无法捕捉真实世界正则表达式的高结构复杂性,例如更深层次的嵌套和并运算的频繁使用。为了克服由此带来的性能下降,我们提出了ReSyn,一个与合成器无关的分治框架,将复杂的合成问题分解为可管理的子问题。我们还引入了Set2Regex,一种参数高效的合成器,能够捕捉示例的置换不变性。实验结果表明,ReSyn显著提升了多种合成器的准确性,且其与Set2Regex的结合在具有挑战性的真实世界基准测试上确立了新的最先进水平。完整的源代码、数据集和预训练模型检查点已公开于https://github.com/mrseongminkim/ReSyn。
English
Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity of real-world regexes, such as deeper nesting and frequent use of union operations. To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problem into manageable sub-problems. We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples. Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark. The complete source code, datasets, and pre-trained model checkpoints are publicly available at https://github.com/mrseongminkim/ReSyn.