ML判断のデコーディング：大規模ランキングシステムのためのエージェント的推論フレームワーク

要旨

現代の大規模ランキングシステムは、競合する目的、運用上の制約、進化する製品要件が交錯する複雑な環境で動作している。この領域における進歩は、モデリング技術そのものよりも、曖昧な製品意図を合理的で実行可能かつ検証可能な仮説へと変換する困難なプロセス、すなわちエンジニアリングコンテキストの制約によって、ますますボトルネックが生じている。本論文では、ランキング最適化をプログラム可能な実験環境内での自律的な発見プロセスとして再定義するフレームワーク、GEARSを提案する。GEARSは最適化を静的なモデル選択として扱うのではなく、専門的なエージェント技能を活用してランキングの専門知識を再利用可能な推論能力としてカプセル化し、オペレーターが高次の意図、すなわち「雰囲気のパーソナライゼーション」を通じてシステムを誘導することを可能にする。さらに、本番環境での信頼性を確保するため、統計的ロバスト性を強化し、短期的シグナルに過剰適合する脆弱なポリシーをフィルタリングする検証フックをフレームワークに組み込んでいる。多様な製品インターフェースにおける実験的検証により、GEARSが厳格なデプロイ安定性を維持しつつ、アルゴリズムシグナルと深いランキングコンテキストを相乗的に活用することで、優れた、ほぼパレート効率的なポリシーを一貫して特定できることが実証されている。

English

Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint: the arduous process of translating ambiguous product intent into reasonable, executable, verifiable hypotheses, rather than by modeling techniques alone. We present GEARS (Generative Engine for Agentic Ranking Systems), a framework that reframes ranking optimization as an autonomous discovery process within a programmable experimentation environment. Rather than treating optimization as static model selection, GEARS leverages Specialized Agent Skills to encapsulate ranking expert knowledge into reusable reasoning capabilities, enabling operators to steer systems via high-level intent vibe personalization. Furthermore, to ensure production reliability, the framework incorporates validation hooks to enforce statistical robustness and filter out brittle policies that overfit short-term signals. Experimental validation across diverse product surfaces demonstrates that GEARS consistently identifies superior, near-Pareto-efficient policies by synergizing algorithmic signals with deep ranking context while maintaining rigorous deployment stability.

ML判断のデコーディング：大規模ランキングシステムのためのエージェント的推論フレームワーク

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

要旨

Support