解碼機器學習決策：面向大規模排序系統的能動推理框架

摘要

现代大规模排序系统运行在目标多元、操作约束复杂且产品需求动态演进的精密生态中。该领域的进展日益受制于工程语境约束——即将模糊的产品意图转化为合理、可执行、可验证假设的艰巨过程，而非单纯受限于建模技术。我们提出GEARS（生成式智能排序系统引擎），该框架将排序优化重构为可编程实验环境中的自主发现过程。通过将专业智能体技能封装为可复用的推理能力，GEARS把排序专家知识转化为可操作的认知模块，使操作者能够通过高层意图导向实现个性化调优。为确保生产可靠性，该框架内置验证钩子以强化统计稳健性，过滤那些过度拟合短期信号的脆弱策略。在多类产品界面上的实验验证表明，GEARS通过算法信号与深度排序语境的协同融合，能持续发现接近帕累托最优的优质策略，同时保持严格的部署稳定性。

English

Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly bottlenecked by the engineering context constraint: the arduous process of translating ambiguous product intent into reasonable, executable, verifiable hypotheses, rather than by modeling techniques alone. We present GEARS (Generative Engine for Agentic Ranking Systems), a framework that reframes ranking optimization as an autonomous discovery process within a programmable experimentation environment. Rather than treating optimization as static model selection, GEARS leverages Specialized Agent Skills to encapsulate ranking expert knowledge into reusable reasoning capabilities, enabling operators to steer systems via high-level intent vibe personalization. Furthermore, to ensure production reliability, the framework incorporates validation hooks to enforce statistical robustness and filter out brittle policies that overfit short-term signals. Experimental validation across diverse product surfaces demonstrates that GEARS consistently identifies superior, near-Pareto-efficient policies by synergizing algorithmic signals with deep ranking context while maintaining rigorous deployment stability.

解碼機器學習決策：面向大規模排序系統的能動推理框架

Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System

摘要

Support