打破数据孤岛：迈向基于生成式持续学习的开放可扩展移动基础模型

摘要

基础模型通过实现跨多样任务和数据集的通用学习，彻底革新了自然语言处理和计算机视觉等领域。然而，构建类似的人类移动性模型仍面临挑战，主要源于移动数据的隐私敏感性以及由此导致的跨机构数据孤岛现象。为弥合这一鸿沟，我们提出了MoveGCL，一个可扩展且保护隐私的框架，通过生成式持续学习来训练移动性基础模型。在不共享原始数据的前提下，MoveGCL通过重播由冻结教师模型生成的合成轨迹，实现了去中心化且渐进式的模型进化，并通过定制的蒸馏策略强化知识保留，有效缓解了灾难性遗忘问题。针对移动模式的异质性，MoveGCL引入了具备移动感知专家路由机制的专家混合Transformer，并采用分层渐进适应策略以稳定持续更新。在六个真实世界城市数据集上的实验表明，MoveGCL实现了与联合训练相媲美的性能，并显著优于联邦学习基线，同时提供了强大的隐私保护。MoveGCL标志着向解锁移动性基础模型迈出了关键一步，为基础模型时代开放、可扩展且保护隐私的模型开发提供了实用蓝图。

English

Foundation models have revolutionized fields such as natural language processing and computer vision by enabling general-purpose learning across diverse tasks and datasets. However, building analogous models for human mobility remains challenging due to the privacy-sensitive nature of mobility data and the resulting data silos across institutions. To bridge this gap, we propose MoveGCL, a scalable and privacy-preserving framework for training mobility foundation models via generative continual learning. Without sharing raw data, MoveGCL enables decentralized and progressive model evolution by replaying synthetic trajectories generated from a frozen teacher model, and reinforces knowledge retention through a tailored distillation strategy that mitigates catastrophic forgetting. To address the heterogeneity of mobility patterns, MoveGCL incorporates a Mixture-of-Experts Transformer with a mobility-aware expert routing mechanism, and employs a layer-wise progressive adaptation strategy to stabilize continual updates. Experiments on six real-world urban datasets demonstrate that MoveGCL achieves performance comparable to joint training and significantly outperforms federated learning baselines, while offering strong privacy protection. MoveGCL marks a crucial step toward unlocking foundation models for mobility, offering a practical blueprint for open, scalable, and privacy-preserving model development in the era of foundation models.

打破数据孤岛：迈向基于生成式持续学习的开放可扩展移动基础模型

Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning

摘要

Support