HiMAP-Travel：面向长周期约束出行的分层多智能体规划

摘要

传统顺序执行的LLM智能体在面临预算和多样性要求等硬约束的长期规划任务时表现不佳。随着规划进程推进和上下文增长，这些智能体会逐渐偏离全局约束。我们提出HiMAP-Travel分层多智能体框架，将规划分解为战略协调和并行日程执行两个层级：协调器负责跨日期分配资源，而日程执行器则并行独立规划。该框架通过三大核心机制实现：事务监控器确保并行智能体间的预算与唯一性约束，协商协议允许智能体拒绝不可行子目标并触发重规划，以及通过角色条件化实现的GRPO统一训练策略。在TravelPlanner测试中，搭载Qwen3-8B的HiMAP-Travel取得52.78%验证集和52.65%测试集最终通过率。在控制模型、训练和工具一致的对比实验中，其表现较顺序执行的DeepTravel基线提升8.67个百分点，同时超越ATLAS方法17.65个百分点、MTP方法10.0个百分点。在FlexTravelBench多轮对话场景中，通过并行化将延迟降低2.5倍的同时，实现44.34%（双轮）和37.42%（三轮）的最终通过率。

English

Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic coordination and parallel day-level execution. A Coordinator allocates resources across days, while Day Executors plan independently in parallel. Three key mechanisms enable this: a transactional monitor enforcing budget and uniqueness constraints across parallel agents, a bargaining protocol allowing agents to reject infeasible sub-goals and trigger re-planning, and a single policy trained with GRPO that powers all agents through role conditioning. On TravelPlanner, HiMAP-Travel with Qwen3-8B achieves 52.78% validation and 52.65% test Final Pass Rate (FPR). In a controlled comparison with identical model, training, and tools, it outperforms the sequential DeepTravel baseline by +8.67~pp. It also surpasses ATLAS by +17.65~pp and MTP by +10.0~pp. On FlexTravelBench multi-turn scenarios, it achieves 44.34% (2-turn) and 37.42% (3-turn) FPR while reducing latency 2.5x through parallelization.