ChatPaper.aiChatPaper

TreeRanker:IDE中快速且模型無關的程式碼建議排序系統

TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs

August 4, 2025
作者: Daniele Cipollone, Egor Bogomolov, Arie van Deursen, Maliheh Izadi
cs.AI

摘要

代碼級別的補全功能是現代集成開發環境(IDE)中最關鍵的特性之一。它通過在編碼過程中建議相關的標識符和API來協助開發者。雖然補全通常來自靜態分析,但其實用性在很大程度上取決於它們的排序方式,因為埋藏在列表深處的正確預測很少被用戶看到。目前大多數系統依賴於手工設計的啟發式方法或基於用戶日誌訓練的輕量級機器學習模型,這些方法可以進一步改進以捕捉上下文信息並跨項目和編碼風格進行泛化。在本研究中,我們提出了一種新的評分方法,以輕量級且模型無關的方式使用語言模型對靜態補全進行排序。我們的方法將所有有效的補全組織成一個前綴樹,並執行一次貪婪解碼遍歷以收集整個樹的代碼級別分數。這使得無需使用束搜索、提示工程或模型適應即可實現精確的代碼感知排序。該方法快速、架構無關,並且與已部署的代碼補全模型兼容。這些發現突顯了將語言模型集成到IDE現有工具中的一條實用且有效的途徑,最終提供更智能、更響應迅速的開發者輔助。
English
Token-level code completion is one of the most critical features in modern Integrated Development Environments (IDEs). It assists developers by suggesting relevant identifiers and APIs during coding. While completions are typically derived from static analysis, their usefulness depends heavily on how they are ranked, as correct predictions buried deep in the list are rarely seen by users. Most current systems rely on hand-crafted heuristics or lightweight machine learning models trained on user logs, which can be further improved to capture context information and generalize across projects and coding styles. In this work, we propose a new scoring approach to ranking static completions using language models in a lightweight and model-agnostic way. Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree. This enables a precise token-aware ranking without needing beam search, prompt engineering, or model adaptations. The approach is fast, architecture-agnostic, and compatible with already deployed models for code completion. These findings highlight a practical and effective pathway for integrating language models into already existing tools within IDEs, and ultimately providing smarter and more responsive developer assistance.
PDF12August 6, 2025