面向Python的神经调试器探索

摘要

基於Python執行追蹤資料訓練大型語言模型，能使其紮根於程式碼執行邏輯，實現對完整Python程式的逐行執行預測，從而將其轉化為神經網路解釋器（FAIR CodeGen團隊等，2025）。然而開發者極少逐行執行程式，而是透過偵錯工具在特定中斷點暫停執行，僅在檢查或修改程式變數時逐步執行相關片段。現有神經解釋器方法缺乏此類互動控制能力。為解決此侷限，我們提出神經偵錯器：這種語言模型能模擬傳統偵錯工具，支援單步進入、跳過或跳出函式等操作，並可在特定原始碼行設置中斷點。我們證明，無論是透過微調大型語言模型或從頭預訓練較小模型獲得的神經偵錯器，皆能可靠地建模正向執行（預測未來狀態與輸出）與逆向執行（推斷過往狀態或輸入），並以偵錯器操作為條件。在CruxEval基準測試中，我們的模型在輸出與輸入預測任務上均表現優異，展現出強大的條件化執行建模能力。本研究為未來具自主性的程式設計系統邁出第一步：神經偵錯器可作為模擬偵錯環境的世界模型，提供執行反饋或使智能體能與真實偵錯工具互動。此能力為更強大的程式碼生成、程式理解與自動化偵錯奠定基礎。

English

Training large language models (LLMs) on Python execution traces grounds them in code execution and enables the line-by-line execution prediction of whole Python programs, effectively turning them into neural interpreters (FAIR CodeGen Team et al., 2025). However, developers rarely execute programs step by step; instead, they use debuggers to stop execution at certain breakpoints and step through relevant portions only while inspecting or modifying program variables. Existing neural interpreter approaches lack such interactive control. To address this limitation, we introduce neural debuggers: language models that emulate traditional debuggers, supporting operations such as stepping into, over, or out of functions, as well as setting breakpoints at specific source lines. We show that neural debuggers -- obtained via fine-tuning large LLMs or pre-training smaller models from scratch -- can reliably model both forward execution (predicting future states and outputs) and inverse execution (inferring prior states or inputs) conditioned on debugger actions. Evaluated on CruxEval, our models achieve strong performance on both output and input prediction tasks, demonstrating robust conditional execution modeling. Our work takes first steps towards future agentic coding systems in which neural debuggers serve as a world model for simulated debugging environments, providing execution feedback or enabling agents to interact with real debugging tools. This capability lays the foundation for more powerful code generation, program understanding, and automated debugging.

面向Python的神经调试器探索

Towards a Neural Debugger for Python

摘要

Support