人工智能编程代理是否如人类般记录日志？一项实证研究

摘要

軟體日誌記錄對於維護和調試複雜系統至關重要，然而AI編碼代理如何處理這項非功能性需求仍不明確。雖然現有研究已刻畫人類的日誌記錄實踐，但AI編碼代理的行為表現及自然語言指令對其的管控效果尚屬空白。為此，我們對81個開源代碼庫中的4,550個代理提交流動請求進行實證研究，將代理的日誌記錄模式與人類基準進行對比，並分析明確日誌指令的影響。研究發現：在58.4%的代碼庫中，代理修改日誌的頻率低於人類，但其修改時日誌密度更高；明確日誌指令僅佔4.7%且效果有限，代理對建設性請求的未遵從率高達67%；人類承擔了72.5%的生成後日誌修復工作，如同"靜默清潔工"般在無明確審查反饋的情況下修復日誌與可觀測性問題。這些發現揭示了自然語言指令的雙重失效（即日誌指令稀缺與代理遵從率低），表明可能需要確定性防護機制來確保一致的日誌記錄實踐。

English

Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requirement. While prior work characterizes human logging practices, the behaviors of AI coding agents and the efficacy of natural language instructions in governing them are unexplored. To address this gap, we conduct an empirical study of 4,550 agentic pull requests across 81 open-source repositories. We compare agent logging patterns against human baselines and analyze the impact of explicit logging instructions. We find that agents change logging less often than humans in 58.4% of repositories, though they exhibit higher log density when they do. Furthermore, explicit logging instructions are rare (4.7%) and ineffective, as agents fail to comply with constructive requests 67% of the time. Finally, we observe that humans perform 72.5% of post-generation log repairs, acting as "silent janitors" who fix logging and observability issues without explicit review feedback. These findings indicate a dual failure in natural language instruction (i.e., scarcity of logging instructions and low agent compliance), suggesting that deterministic guardrails might be necessary to ensure consistent logging practices.

人工智能编程代理是否如人类般记录日志？一项实证研究

Do AI Coding Agents Log Like Humans? An Empirical Study

摘要

Support