ChatPaper.aiChatPaper

TreeRanker:IDE中代码建议的快速且模型无关的排序系统

TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs

August 4, 2025
作者: Daniele Cipollone, Egor Bogomolov, Arie van Deursen, Maliheh Izadi
cs.AI

摘要

在现代集成开发环境(IDE)中,代码的逐词补全功能至关重要。它通过在编码过程中推荐相关标识符和API来辅助开发者。虽然补全建议通常源自静态分析,但其实际效用很大程度上取决于如何对这些建议进行排序,因为深藏于列表中的正确预测很少被用户注意到。当前大多数系统依赖于手工设计的启发式规则或基于用户日志训练的轻量级机器学习模型,这些方法在捕捉上下文信息及跨项目和编码风格泛化方面仍有提升空间。本研究提出了一种新颖的评分方法,利用语言模型以轻量且模型无关的方式对静态补全建议进行排序。我们的方法将所有有效补全组织成前缀树,并通过一次贪心解码遍历收集树中各词元的评分,从而实现无需束搜索、提示工程或模型调整的精确词元感知排序。该方法快速、架构无关,并能与已部署的代码补全模型兼容。这些发现为将语言模型整合到IDE现有工具中提供了一条实用且高效的途径,最终为开发者提供更智能、响应更迅速的辅助支持。
English
Token-level code completion is one of the most critical features in modern Integrated Development Environments (IDEs). It assists developers by suggesting relevant identifiers and APIs during coding. While completions are typically derived from static analysis, their usefulness depends heavily on how they are ranked, as correct predictions buried deep in the list are rarely seen by users. Most current systems rely on hand-crafted heuristics or lightweight machine learning models trained on user logs, which can be further improved to capture context information and generalize across projects and coding styles. In this work, we propose a new scoring approach to ranking static completions using language models in a lightweight and model-agnostic way. Our method organizes all valid completions into a prefix tree and performs a single greedy decoding pass to collect token-level scores across the tree. This enables a precise token-aware ranking without needing beam search, prompt engineering, or model adaptations. The approach is fast, architecture-agnostic, and compatible with already deployed models for code completion. These findings highlight a practical and effective pathway for integrating language models into already existing tools within IDEs, and ultimately providing smarter and more responsive developer assistance.
PDF12August 6, 2025