AI编程助手是否如人类般记录日志？一项实证研究

摘要

软件日志记录对于维护和调试复杂系统至关重要，但AI编程代理如何处理这一非功能性需求仍不明确。现有研究主要聚焦于人类开发者的日志实践，而AI编程代理的行为特征及其受自然语言指令调控的有效性尚属空白。为此，我们对81个开源仓库中的4,550个代理式拉取请求展开实证研究，通过对比人类基线日志模式，分析显式日志指令的影响。研究发现：在58.4%的仓库中，代理修改日志的频率低于人类，但其修改时的日志密度更高；显式日志指令仅占4.7%且效果有限，代理对建设性日志要求的未遵从率达67%；此外，72.5%的日志修复由人类在代码生成后完成，这些"无声清道夫"在没有明确评审反馈的情况下默默修复可观测性问题。这些发现揭示了自然语言指令的双重失效（即日志指令稀缺与代理依从性低下），表明可能需要确定性防护机制来保障一致的日志实践。

English

Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requirement. While prior work characterizes human logging practices, the behaviors of AI coding agents and the efficacy of natural language instructions in governing them are unexplored. To address this gap, we conduct an empirical study of 4,550 agentic pull requests across 81 open-source repositories. We compare agent logging patterns against human baselines and analyze the impact of explicit logging instructions. We find that agents change logging less often than humans in 58.4% of repositories, though they exhibit higher log density when they do. Furthermore, explicit logging instructions are rare (4.7%) and ineffective, as agents fail to comply with constructive requests 67% of the time. Finally, we observe that humans perform 72.5% of post-generation log repairs, acting as "silent janitors" who fix logging and observability issues without explicit review feedback. These findings indicate a dual failure in natural language instruction (i.e., scarcity of logging instructions and low agent compliance), suggesting that deterministic guardrails might be necessary to ensure consistent logging practices.