유형화된 홀을 통한 대규모 언어 모델의 정적 문맥화

초록

대규모 언어 모델(LLM)은 프로그램 합성의 지형도를 재편해왔습니다. 그러나 현대의 LLM 기반 코드 완성 시스템은 종종 적절한 컨텍스트, 특히 훈련 데이터에 없거나 커서 근처에 없는 정의를 다룰 때 부족하여 손상된 코드를 환각(생성)하는 경우가 많습니다. 본 논문은 언어 서버가 노출하는 언어의 타입 및 바인딩 구조와의 긴밀한 통합이 이러한 컨텍스트화 문제를 토큰 효율적인 방식으로 해결할 수 있음을 보여줍니다. 간단히 말해, 우리는 AI에게도 IDE가 필요하다고 주장합니다! 특히, 우리는 LLM 코드 생성을 Hazel 라이브 프로그램 스케치 환경에 통합합니다. Hazel 언어 서버는 오류가 있는 상황에서도 채워져야 할 홀(hole)의 타입과 타이핑 컨텍스트를 식별하여 의미 있는 프로그램 스케치가 항상 사용 가능하도록 보장합니다. 이를 통해 커서의 어휘적 근처에 있지 않거나 반드시 동일한 파일에 있지 않더라도 개발자의 목표에 의미론적으로 가까울 가능성이 있는 코드베이스 전반의 컨텍스트 정보로 프롬프팅할 수 있습니다. LLM에 의해 합성된 완성물은 이후 언어 서버와의 추가 대화를 통해 반복적으로 정제됩니다. 이러한 기법들을 평가하기 위해 우리는 모델-뷰-업데이트(MVU) 웹 애플리케이션 데이터셋인 MVUBench를 소개합니다. 이러한 애플리케이션들은 애플리케이션 특화 데이터 구조에 의존하기 때문에 도전 과제로 작용합니다. 우리는 타입 정의를 활용한 컨텍스트화가 특히 영향력이 크다는 사실을 발견했습니다. Hazel 컨텍스트에서 우리의 아이디어를 소개한 후, 이러한 기법들의 적용 가능성을 검증하기 위해 동일 기법을 복제하고 MVUBench를 TypeScript로 포팅하여 더 많은 리소스를 가진 언어에도 적용합니다. 마지막으로, 언어 서버가 구현할 수 있으며 다양한 설계의 AI 코드 완성 시스템이 LLM용 프롬프트 생성 시 정적 컨텍스트를 통합하는 데 사용할 수 있는 기능을 노출하기 위한 언어 서버 프로토콜(LSP)의 보수적 확장인 ChatLSP에 대해 간략히 설명합니다.

English

Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.

유형화된 홀을 통한 대규모 언어 모델의 정적 문맥화

Statically Contextualizing Large Language Models with Typed Holes

초록

Support