视野长度预测：通过前瞻规划提升代码生成中的填充能力

摘要

填充中间（FIM）已经成为代码语言模型的重要组成部分，能够在给定左右上下文的情况下生成缺失的代码。然而，目前的FIM训练范式重新排列原始训练序列，然后执行常规的下一个标记预测（NTP），通常导致模型难以生成与周围上下文平滑对齐的内容。至关重要的是，尽管现有作品依赖基于规则的后处理来规避这一弱点，但这些方法在开放域代码补全任务中并不实用，因为它们依赖于限制性的、特定于数据集的假设（例如，生成与地面真实中相同数量的行）。此外，在没有这些不切实际的假设的情况下，模型在FIM任务上的性能会显著下降。我们假设仅靠NTP是不足以让模型学会基于远程右上下文进行有效规划，这是成功进行代码填充的关键因素。为了克服这一点，我们提出了“视野长度预测”（HLP），这是一种新颖的训练目标，教导模型在每一步预测剩余中间标记的数量（即，视野长度）。HLP通过前瞻规划推进了FIM，使模型能够在任意左右上下文中固有地学习填充边界，而无需依赖特定于数据集的后处理。我们在不同模型和规模上的评估显示，HLP在各种基准测试中显著提高了FIM性能，相对提高了高达24％，跨文件级和存储库级，而且无需使用不切实际的后处理方法。此外，通过HLP获得的增强规划能力提升了模型在代码推理上的性能。重要的是，HLP只带来微不足道的训练开销，且没有额外的推理成本，确保其在实际场景中的实用性。

English

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

视野长度预测：通过前瞻规划提升代码生成中的填充能力

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

摘要

Support