ホライズン長予測：先読み計画を活用したコード生成のためのフィル・イン・ザ・ミドル機能の拡張

要旨

Fill-in-the-Middle（FIM）は、コード言語モデルにとって不可欠となり、左右の文脈を考慮して欠落しているコードを生成することを可能にしています。ただし、現在のFIMトレーニングパラダイムは、元のトレーニングシーケンスを並べ替えてから通常の次のトークン予測（NTP）を行うため、周囲の文脈とスムーズに整合するコンテンツを生成するのに苦労するモデルがしばしば生じます。重要なのは、既存の研究がこの弱点を回避するためにルールベースの事後処理に依存している一方、そのような方法は制限的でデータセット固有の仮定（例：正解と同じ行数を生成する）に依存するため、オープンドメインのコード補完タスクでは実用的に使用できません。さらに、これらの非現実的な仮定がないと、FIMタスクでのモデルのパフォーマンスが著しく低下します。我々は、NTPだけではモデルが遠い右文脈に依存した効果的なプランニングを学習するのに不十分であり、成功裏のコード補完において重要な要素であると仮定しています。この問題を克服するために、Horizon-Length Prediction（HLP）という新しいトレーニング目標を提案します。これにより、モデルは各ステップで残りの中間トークン（つまり、地平線の長さ）の数を予測することを学習します。HLPは、先読みプランニングを進めることでFIMを進化させ、データセット固有の事後処理に依存せずに任意の左右の文脈に対する埋め込みの境界を学習することを可能にします。異なるモデルやサイズにわたる評価により、HLPが様々なベンチマークでFIMのパフォーマンスを最大24％改善し、ファイルレベルおよびリポジトリレベルで、非現実的な事後処理方法に頼らずに行います。さらに、HLPによって獲得した向上したプランニング能力は、コード推論のモデルパフォーマンスを向上させます。重要なことは、HLPはほとんどトレーニングオーバーヘッドを発生させず、追加の推論コストもかからないため、実世界のシナリオでの実用性が確保されています。

English

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

ホライズン長予測：先読み計画を活用したコード生成のためのフィル・イン・ザ・ミドル機能の拡張

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

要旨

Support