TourPlanner:基于约束门控强化学习的旅游规划竞争性共识框架
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning
January 8, 2026
作者: Yinuo Wang, Mining Tan, Wenxiang Jiao, Xiaoxi Li, Hao Wang, Xuanyu Zhang, Yuan Lu, Weiming Dong
cs.AI
摘要
旅行规划是一项复杂的决策过程,需要综合多维度信息以构建行程方案。然而现有方法面临三大挑战:(1)如何在保持高召回率的同时筛选候选兴趣点;(2)单一路径推理限制了方案空间的探索能力;(3)硬约束与软约束的协同优化仍是重大难题。为此,我们提出TourPlanner——一个融合多路径推理与约束门控强化学习的综合框架。具体而言,首先通过个性化召回与空间优化工作流构建空间感知的候选兴趣点集;继而提出竞争性共识思维链的多路径推理范式,增强可行解空间的探索能力;最后在强化学习阶段引入基于S型函数的门控机制,实现硬约束达标后软约束的动态优先优化。在旅行规划基准测试中,TourPlanner在方案可行性与用户偏好契合度上显著超越现有方法,达到最优性能。
English
Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face several challenges: (1) Pruning candidate points of interest (POIs) while maintaining a high recall rate; (2) A single reasoning path restricts the exploration capability within the feasible solution space for travel planning; (3) Simultaneously optimizing hard constraints and soft constraints remains a significant difficulty. To address these challenges, we propose TourPlanner, a comprehensive framework featuring multi-path reasoning and constraint-gated reinforcement learning. Specifically, we first introduce a Personalized Recall and Spatial Optimization (PReSO) workflow to construct spatially-aware candidate POIs' set. Subsequently, we propose Competitive consensus Chain-of-Thought (CCoT), a multi-path reasoning paradigm that improves the ability of exploring the feasible solution space. To further refine the plan, we integrate a sigmoid-based gating mechanism into the reinforcement learning stage, which dynamically prioritizes soft-constraint satisfaction only after hard constraints are met. Experimental results on travel planning benchmarks demonstrate that TourPlanner achieves state-of-the-art performance, significantly surpassing existing methods in both feasibility and user-preference alignment.