HiMAP-Travel：基於分層多智能體規劃的長期約束性旅行方案

摘要

傳統序列式大型語言模型代理在處理具有預算限制與多樣性要求等嚴格約束的長程規劃任務時表現不佳。隨著規劃進程推進與上下文擴充，這類代理往往會偏離全域約束。我們提出HiMAP-Travel階層式多代理框架，將規劃拆分為策略協調與並行日程執行兩階段：協調器負責跨日資源分配，日程執行器則可並行獨立規劃。該框架通過三大關鍵機制實現：跨並行代理的預算與唯一性約束事務監控器、允許代理拒絕不可行子目標並觸發重規劃的協商協議，以及通過角色條件化由單一GRPO訓練策略驅動所有代理。在TravelPlanner測試中，搭載Qwen3-8B的HiMAP-Travel達成52.78%驗證集與52.65%測試集最終通過率。在控制模型、訓練與工具變因的對比實驗中，其表現較序列式DeepTravel基準提升+8.67個百分點，並超越ATLAS達+17.65個百分點、優於MTP達+10.0個百分點。在FlexTravelBench多輪對話場景中，通過並行化將延遲降低2.5倍的同時，實現44.34%（雙輪）與37.42%（三輪）的最終通過率。

English

Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic coordination and parallel day-level execution. A Coordinator allocates resources across days, while Day Executors plan independently in parallel. Three key mechanisms enable this: a transactional monitor enforcing budget and uniqueness constraints across parallel agents, a bargaining protocol allowing agents to reject infeasible sub-goals and trigger re-planning, and a single policy trained with GRPO that powers all agents through role conditioning. On TravelPlanner, HiMAP-Travel with Qwen3-8B achieves 52.78% validation and 52.65% test Final Pass Rate (FPR). In a controlled comparison with identical model, training, and tools, it outperforms the sequential DeepTravel baseline by +8.67~pp. It also surpasses ATLAS by +17.65~pp and MTP by +10.0~pp. On FlexTravelBench multi-turn scenarios, it achieves 44.34% (2-turn) and 37.42% (3-turn) FPR while reducing latency 2.5x through parallelization.

HiMAP-Travel：基於分層多智能體規劃的長期約束性旅行方案

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

摘要

Support