코딩 삼각형: 대규모 언어 모델은 코드를 어떻게 이해하는가?

초록

대형 언어 모델(LLMs)은 코드 생성 분야에서 놀라운 진전을 이루었지만, 그들의 진정한 프로그래밍 역량은 아직 충분히 탐구되지 않았습니다. 우리는 코드 삼각형(Code Triangle) 프레임워크를 소개하며, 이는 LLMs를 세 가지 기본 차원에서 체계적으로 평가합니다: 편집 분석, 코드 구현, 그리고 테스트 케이스 생성. 경쟁 프로그래밍 벤치마크를 통한 광범위한 실험을 통해, 우리는 LLMs가 이러한 차원들에서 자체 일관된 시스템을 형성할 수 있지만, 그들의 솔루션은 종종 인간 프로그래머의 다양성과 견고성을 결여하고 있음을 밝혔습니다. 우리는 모델 인지와 인간 전문 지식 사이에 상당한 분포 변화가 있음을 확인했으며, 모델 오류는 훈련 데이터 편향과 제한된 추론 전이로 인해 군집화되는 경향이 있습니다. 우리의 연구는 인간이 생성한 편집물, 솔루션, 그리고 다양한 테스트 케이스를 통합하고, 모델 혼합을 활용함으로써 LLMs의 성능과 견고성을 크게 향상시킬 수 있음을 보여줍니다. 더 나아가, 우리는 LLMs의 인지에서 일관성과 불일치를 모두 드러내며, 이는 자기 반성과 자기 개선을 촉진할 수 있어 더 강력한 코딩 모델 개발을 위한 잠재적인 방향을 제공합니다.

English

Large language models (LLMs) have achieved remarkable progress in code generation, yet their true programming competence remains underexplored. We introduce the Code Triangle framework, which systematically evaluates LLMs across three fundamental dimensions: editorial analysis, code implementation, and test case generation. Through extensive experiments on competitive programming benchmarks, we reveal that while LLMs can form a self-consistent system across these dimensions, their solutions often lack the diversity and robustness of human programmers. We identify a significant distribution shift between model cognition and human expertise, with model errors tending to cluster due to training data biases and limited reasoning transfer. Our study demonstrates that incorporating human-generated editorials, solutions, and diverse test cases, as well as leveraging model mixtures, can substantially enhance both the performance and robustness of LLMs. Furthermore, we reveal both the consistency and inconsistency in the cognition of LLMs that may facilitate self-reflection and self-improvement, providing a potential direction for developing more powerful coding models.

코딩 삼각형: 대규모 언어 모델은 코드를 어떻게 이해하는가?

Coding Triangle: How Does Large Language Model Understand Code?

초록

Support