RHYTHM：基于层次化时序标记推理的人类移动性分析

摘要

预测人类移动行为本质上具有挑战性，这源于复杂的长期依赖关系和多尺度周期性行为。为解决这一问题，我们提出了RHYTHM（基于层次化时间标记化的人类移动推理框架），这是一个统一框架，利用大型语言模型（LLMs）作为通用时空预测器和轨迹推理器。在方法论上，RHYTHM采用时间标记化技术，将每条轨迹分割为每日片段，并通过层次化注意力机制将其编码为离散标记，既捕捉了每日也涵盖了每周的依赖关系，从而在保留周期信息的同时显著缩短了序列长度。此外，我们通过预计算提示嵌入来丰富标记表示，这些嵌入针对轨迹片段和预测目标，经由冻结的LLM处理后，将组合后的嵌入反馈至LLM主干，以捕获复杂的相互依赖关系。在计算层面，RHYTHM冻结了预训练LLM的主干，以降低注意力复杂度和内存消耗。我们在三个真实世界数据集上对模型进行了评估，与最先进方法相比，RHYTHM在整体准确率上提升了2.4%，周末预测准确率提高了5.0%，训练时间减少了24.6%。代码已公开于https://github.com/he-h/rhythm。

English

Predicting human mobility is inherently challenging due to complex long-range dependencies and multi-scale periodic behaviors. To address this, we introduce RHYTHM (Reasoning with Hierarchical Temporal Tokenization for Human Mobility), a unified framework that leverages large language models (LLMs) as general-purpose spatio-temporal predictors and trajectory reasoners. Methodologically, RHYTHM employs temporal tokenization to partition each trajectory into daily segments and encode them as discrete tokens with hierarchical attention that captures both daily and weekly dependencies, thereby significantly reducing the sequence length while preserving cyclical information. Additionally, we enrich token representations by adding pre-computed prompt embeddings for trajectory segments and prediction targets via a frozen LLM, and feeding these combined embeddings back into the LLM backbone to capture complex interdependencies. Computationally, RHYTHM freezes the pretrained LLM's backbone to reduce attention complexity and memory cost. We evaluate our model against state-of-the-art methods using three real-world datasets. Notably, RHYTHM achieves a 2.4% improvement in overall accuracy, a 5.0% increase on weekends, and a 24.6% reduction in training time. Code is publicly available at https://github.com/he-h/rhythm.

RHYTHM：基于层次化时序标记推理的人类移动性分析

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility

摘要

Support