PIPer:基于在线强化学习的设备端环境配置系统
PIPer: On-Device Environment Setup via Online Reinforcement Learning
September 29, 2025
作者: Alexander Kovrigin, Aleksandra Eliseeva, Konstantin Grotov, Egor Bogomolov, Yaroslav Zharov
cs.AI
摘要
环境配置——即调整系统以适应特定软件项目的过程——在软件工程(SE)领域始终是一项持续挑战。自动化的环境配置方法能够为开发者提供无需手动干预的、针对任意代码库的完整配置环境,从而提供帮助。这同样有助于软件工程研究者扩展基于执行的基准测试。然而,近期研究表明,即便是最先进的大型语言模型(LLMs)在自动化这一任务上也仅取得有限成功。为突破这一局限,我们专门针对环境配置任务优化了一个模型。我们结合了监督微调技术以生成正确的Bash脚本,以及带有可验证奖励的强化学习(RLVR),使模型更好地适应环境配置工作。在EnvBench-Python测试集上,我们的方法使得Qwen3-8B(一款可在消费级硬件上运行的模型)表现与更大规模的模型——Qwen3-32B和GPT-4o——相当。训练代码及模型检查点已在线发布:https://github.com/JetBrains-Research/PIPer。
English
Environment setup-the process of configuring the system to work with a
specific software project-represents a persistent challenge in Software
Engineering (SE). Automated environment setup methods could assist developers
by providing fully configured environments for arbitrary repositories without
manual effort. This also helps SE researchers to scale execution-based
benchmarks. However, recent studies reveal that even state-of-the-art Large
Language Models (LLMs) achieve limited success in automating this task. To
address this limitation, we tune a specialized model for environment setup. We
combine supervised fine-tuning for generating correct Bash scripts and
Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task
of environment setup. On EnvBench-Python, our method enables Qwen3-8B (a model
runnable on consumer hardware) to perform on par with larger models-Qwen3-32B
and GPT-4o. The training code and model checkpoints are available online:
https://github.com/JetBrains-Research/PIPer.