PIPer：基於線上強化學習的設備端環境配置

摘要

环境配置——即调整系统以适应特定软件项目的过程——在软件工程（SE）领域始终是一项持续的挑战。自动化的环境配置方法能够通过为任意代码库提供完全配置好的环境，无需人工干预，从而协助开发者。这同样有助于SE研究者扩展基于执行的基准测试。然而，近期研究表明，即便是最先进的大型语言模型（LLMs），在自动化这一任务上也仅取得有限成功。为克服此局限，我们针对环境配置任务调优了一个专用模型。我们结合了监督式微调以生成正确的Bash脚本，以及带有可验证奖励的强化学习（RLVR），使其适应环境配置任务。在EnvBench-Python测试集上，我们的方法使得Qwen3-8B（一款可在消费级硬件上运行的模型）表现与更大规模的模型——Qwen3-32B和GPT-4o——旗鼓相当。训练代码及模型检查点已在线发布：https://github.com/JetBrains-Research/PIPer。

English

Environment setup-the process of configuring the system to work with a specific software project-represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success in automating this task. To address this limitation, we tune a specialized model for environment setup. We combine supervised fine-tuning for generating correct Bash scripts and Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task of environment setup. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to perform on par with larger models-Qwen3-32B and GPT-4o. The training code and model checkpoints are available online: https://github.com/JetBrains-Research/PIPer.

PIPer：基於線上強化學習的設備端環境配置

PIPer: On-Device Environment Setup via Online Reinforcement Learning

摘要

Support