调试衰减指数：重新思考代码大语言模型的调试策略

摘要

人工智能调试效能呈现出可预测的指数衰减模式：尽管迭代调试是实用代码生成系统的关键能力，但大多数模型在仅2-3次尝试后便会丧失60-80%的调试能力。我们引入了调试衰减指数（DDI），这一数学框架能够量化调试何时失效并预测干预时机。我们的策略性重启方法在调试过程中的关键节点从利用转向探索，证明了适时干预能够挽救调试的有效性。DDI揭示了当前AI调试的一个根本性局限，并为优化迭代代码生成策略提供了首个量化框架。

English

The effectiveness of AI debugging follows a predictable exponential decay pattern; most models lose 60-80% of their debugging capability within just 2-3 attempts, despite iterative debugging being a critical capability for practical code generation systems. We introduce the Debugging Decay Index (DDI), a mathematical framework that quantifies when debugging becomes ineffective and predicts intervention points. Our strategic fresh start approach shifts from exploitation to exploration at strategic points in the debugging process, demonstrating that well-timed interventions can rescue the effectiveness of debugging. DDI reveals a fundamental limitation in current AI debugging and provides the first quantitative framework for optimising iterative code generation strategies.

调试衰减指数：重新思考代码大语言模型的调试策略

The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs

摘要

Support