HiMAP-Travel: 長期的制約付き旅行のための階層型マルチエージェント計画

要旨

逐次型LLMエージェントは、予算や多様性要件のような厳格な制約を伴う長期計画において課題がある。計画が進行し文脈が増大するにつれ、これらのエージェントはグローバルな制約から逸脱していく。我々はHiMAP-Travelを提案する。これは戦略的調整と並列的な日次実行に計画を分割する階層型マルチエージェントフレームワークである。Coordinatorが日々のリソースを割り当て、Day Executorが並列独立して計画を実行する。これを実現する3つの鍵となるメカニズムがある：並列エージェント間で予算と一意性制約を強制するトランザクショナルモニター、実行不可能なサブ目標を拒否して再計画を促すバーゲニングプロトコル、そして役割条件付けを通じて全エージェントを駆動するGRPOで訓練された単一ポリシーである。TravelPlannerにおいて、Qwen3-8Bを用いたHiMAP-Travelは52.78%の検証精度と52.65%のテスト最終通過率（FPR）を達成した。同一モデル・訓練・ツールによる比較では、逐次型ベースラインDeepTravelを+8.67pp上回った。またATLASを+17.65pp、MTPを+10.0pp上回った。FlexTravelBenchのマルチターンシナリオでは、44.34%（2ターン）と37.42%（3ターン）のFPRを達成し、並列化によりレイテンシを2.5倍削減した。

English

Sequential LLM agents fail on long-horizon planning with hard constraints like budgets and diversity requirements. As planning progresses and context grows, these agents drift from global constraints. We propose HiMAP-Travel, a hierarchical multi-agent framework that splits planning into strategic coordination and parallel day-level execution. A Coordinator allocates resources across days, while Day Executors plan independently in parallel. Three key mechanisms enable this: a transactional monitor enforcing budget and uniqueness constraints across parallel agents, a bargaining protocol allowing agents to reject infeasible sub-goals and trigger re-planning, and a single policy trained with GRPO that powers all agents through role conditioning. On TravelPlanner, HiMAP-Travel with Qwen3-8B achieves 52.78% validation and 52.65% test Final Pass Rate (FPR). In a controlled comparison with identical model, training, and tools, it outperforms the sequential DeepTravel baseline by +8.67~pp. It also surpasses ATLAS by +17.65~pp and MTP by +10.0~pp. On FlexTravelBench multi-turn scenarios, it achieves 44.34% (2-turn) and 37.42% (3-turn) FPR while reducing latency 2.5x through parallelization.

HiMAP-Travel: 長期的制約付き旅行のための階層型マルチエージェント計画

HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

要旨

Support