코드 생성을 위한 대규모 언어 모델 프롬프팅 가이드라인: 실증적 특성 분석

초록

대규모 언어 모델(LLM)은 현재 주로 코드 생성을 비롯한 다양한 소프트웨어 공학 작업에 광범위하게 활용되고 있습니다. 선행 연구에서는 적절한 프롬프트 엔지니어링이 개발자의 코드 생성 프롬프트 개선에 어떻게 도움을 줄 수 있는지 보여주었습니다. 그러나 지금까지 코드 생성을 위한 적절한 프롬프트 작성을 개발자에게 안내하는 구체적인 가이드라인은 존재하지 않았습니다. 본 연구에서는 개발 특화 프롬프트 최적화 가이드라인을 도출하고 평가합니다. 먼저, 테스트 주도 반복 접근법을 사용하여 코드 생성 프롬프트를 자동으로 개선하고, 이 과정의 결과를 분석하여 테스트 통과로 이어지는 프롬프트 개선 요소를 식별합니다. 이러한 요소를 바탕으로 입출력 및 사후조건 명세화, 예제 제공, 다양한 세부사항 명시, 모호함 해소 등과 관련된 10가지 프롬프트 개선 가이드라인을 도출합니다. 50명의 실무자를 대상으로 평가를 진행하여, 도출된 프롬프트 개선 패턴의 활용도와 인지된 유용성을 조사하였으며, 이는 가이드라인 인지 전 실제 활용도와 항상 일치하지는 않았습니다. 연구 결과는 실무자와 교육자뿐만 아니라 더 나은 LLM 지원 소프트웨어 개발 도구를 만들고자 하는 이들에게도 시사점을 제공합니다.

English

Large Language Models (LLMs) are nowadays extensively used for various types of software engineering tasks, primarily code generation. Previous research has shown how suitable prompt engineering could help developers in improving their code generation prompts. However, so far, there do not exist specific guidelines driving developers towards writing suitable prompts for code generation. In this work, we derive and evaluate development-specific prompt optimization guidelines. First, we use an iterative, test-driven approach to automatically refine code generation prompts, and we analyze the outcome of this process to identify prompt improvement items that lead to test passes. We use such elements to elicit 10 guidelines for prompt improvement, related to better specifying I/O, pre-post conditions, providing examples, various types of details, or clarifying ambiguities. We conduct an assessment with 50 practitioners, who report their usage of the elicited prompt improvement patterns, as well as their perceived usefulness, which does not always correspond to the actual usage before knowing our guidelines. Our results lead to implications not only for practitioners and educators, but also for those aimed at creating better LLM-aided software development tools.

코드 생성을 위한 대규모 언어 모델 프롬프팅 가이드라인: 실증적 특성 분석

Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

초록

Support