編碼三角：大型語言模型如何理解代碼？

摘要

大型語言模型（LLMs）在代碼生成方面取得了顯著進展，但其真正的編程能力仍未被充分探索。我們引入了代碼三角框架，該框架系統性地評估LLMs在三個基本維度上的表現：編輯分析、代碼實現和測試案例生成。通過在競技編程基準上的廣泛實驗，我們發現，雖然LLMs能夠在這些維度上形成一個自洽的系統，但其解決方案往往缺乏人類程序員的多樣性和魯棒性。我們識別出模型認知與人類專業知識之間存在顯著的分佈偏移，模型錯誤往往由於訓練數據偏差和有限的推理遷移而聚集。我們的研究表明，結合人類生成的編輯、解決方案和多樣化的測試案例，以及利用模型混合，可以顯著提升LLMs的性能和魯棒性。此外，我們揭示了LLMs認知中的一致性和不一致性，這可能促進自我反思和自我改進，為開發更強大的編碼模型提供了潛在方向。

English

Large language models (LLMs) have achieved remarkable progress in code generation, yet their true programming competence remains underexplored. We introduce the Code Triangle framework, which systematically evaluates LLMs across three fundamental dimensions: editorial analysis, code implementation, and test case generation. Through extensive experiments on competitive programming benchmarks, we reveal that while LLMs can form a self-consistent system across these dimensions, their solutions often lack the diversity and robustness of human programmers. We identify a significant distribution shift between model cognition and human expertise, with model errors tending to cluster due to training data biases and limited reasoning transfer. Our study demonstrates that incorporating human-generated editorials, solutions, and diverse test cases, as well as leveraging model mixtures, can substantially enhance both the performance and robustness of LLMs. Furthermore, we reveal both the consistency and inconsistency in the cognition of LLMs that may facilitate self-reflection and self-improvement, providing a potential direction for developing more powerful coding models.

編碼三角：大型語言模型如何理解代碼？

Coding Triangle: How Does Large Language Model Understand Code?

摘要

Support