コードエージェントにはどれだけの静的構造が必要か？：決定的アンカリングの研究

要旨

LLMベースのコードエージェントはキーワード検索を通じてリポジトリを探索するが、ソフトウェアが実際にどのように動作するかを定義する呼び出しグラフ、継承階層、設定依存関係などの構造的関係を見逃している。このため、エージェントのナビゲーションは確率的となり、実行間で再現することが困難になる。我々は、軽量な静的解析がこれらのエージェントに対して決定論的アンカーを提供できるかどうかを調査する：確率的探索を制約しナビゲーションをより予測可能にする、プレーンテキストコメントとして注入される安定した構造的事実である。OpenAIのCodexという強力なベースラインから出発し、異なる粒度の構造アノテーションを体系的に注入し、局所化、軌跡動作、実行間の安定性に対する効果を測定する。我々の研究は、決定論的アンカリング効果と呼ぶものを特定する：静的構造はエージェントを「賢く」することよりも、そのナビゲーションを規律正しく再現可能にすることによってより効果を発揮する。この発見を裏付ける3つの観察結果がある：(1) アンカリングは機能する：軽量な呼び出し/継承トポロジーは関数レベルの局所化を向上させ（+2.2pp Func@5）、軌跡を短縮する（-1.6インタラクションラウンド）；(2) アンカリングは規模に敏感である：最適な粒度と方向性はリポジトリの特性に依存し、より密なセマンティクスは収穫逓減を示し、ハブ集中型プロジェクトは前方エッジなしで「自分を呼び出すもの」を明らかにする逆方向のみのリンクから恩恵を受ける；(3) アンカリングは安定化する：タグはリンク追従率を0.15-0.18から0.21-0.24に引き上げ、実行間のばらつきをほぼ半減させ、中規模リポジトリにおいて単一実行の信頼性を向上させる（Pass@1 +3.4 pp）。その代償として入力トークンが約10%増加する。これらの観察結果は実用的なガイドラインを示唆する：中規模プロジェクトでは軽量トポロジーをデフォルトとし、大規模リポジトリでは前方エッジを削減し、暗黙的依存関係のケースには密なタグを留保する。

English

LLM-based code agents navigate repositories through keyword search but miss the structural relationships, such as call graphs, inheritance hierarchies, and configuration dependencies, that define how software actually works. This makes agent navigation stochastic and difficult to reproduce across runs. We investigate whether lightweight static analysis can provide deterministic anchors for these agents: stable structural facts injected as plain-text comments that constrain probabilistic exploration and make navigation more predictable. Starting from a strong baseline, Codex from OpenAI, we systematically inject varying granularities of structural annotations and measure their effects on localization, trajectory behavior, and run-to-run stability. Our study identifies what we call the deterministic anchoring effect: static structure helps less by making agents "smarter" and more by making their navigation disciplined and reproducible. Three observations support this finding: (1) Anchoring works: lightweight call/inheritance topology improves function-level localization (+2.2pp Func@5) and shortens trajectories (-1.6 interaction rounds); (2) Anchoring is scale-sensitive: the optimal granularity and directionality depend on repository characteristics, where denser semantics show diminishing returns and hub-heavy projects benefit from inverse-only links that expose "who-calls-me" without forward edges; (3) Anchoring stabilizes: tags raise link-following rate from 0.15-0.18 to 0.21-0.24, roughly halve run-to-run variance, and improve single-run reliability (Pass@1 +3.4 pp) on medium-scale repositories, at the cost of roughly 10% more input tokens. These observations suggest practical guidelines: default to lightweight topology on medium projects, prune forward edges in large repositories, and reserve dense tags for implicit-dependency cases.