데이터 사일로 해체: 생성적 지속 학습을 통한 개방적이고 확장 가능한 모빌리티 기초 모델로의 여정

초록

파운데이션 모델은 다양한 작업과 데이터셋에 걸쳐 일반적인 목적의 학습을 가능하게 함으로써 자연어 처리 및 컴퓨터 비전과 같은 분야에 혁신을 가져왔습니다. 그러나 인간의 이동성에 대한 유사한 모델을 구축하는 것은 이동성 데이터의 프라이버시 민감성과 이로 인해 발생하는 기관 간 데이터 사일로로 인해 여전히 어려운 과제로 남아 있습니다. 이러한 격차를 해소하기 위해, 우리는 생성적 지속 학습을 통해 이동성 파운데이션 모델을 훈련하기 위한 확장 가능하고 프라이버시를 보호하는 프레임워크인 MoveGCL을 제안합니다. MoveGCL은 원시 데이터를 공유하지 않고도 고정된 교사 모델에서 생성된 합성 궤적을 재생함으로써 분산적이고 점진적인 모델 진화를 가능하게 하며, 치명적인 망각을 완화하기 위한 맞춤형 지식 증류 전략을 통해 지식 보존을 강화합니다. 이동성 패턴의 이질성을 해결하기 위해 MoveGCL은 이동성 인식 전문가 라우팅 메커니즘을 갖춘 Mixture-of-Experts Transformer를 통합하고, 지속적인 업데이트를 안정화하기 위해 계층별 점진적 적응 전략을 사용합니다. 6개의 실제 도시 데이터셋에 대한 실험 결과, MoveGCL은 공동 훈련과 비슷한 성능을 달성하고 연합 학습 기준선을 크게 능가하는 동시에 강력한 프라이버시 보호를 제공하는 것으로 나타났습니다. MoveGCL은 이동성을 위한 파운데이션 모델의 잠재력을 해제하는 데 있어 중요한 단계를 표시하며, 파운데이션 모델 시대에 개방적이고 확장 가능하며 프라이버시를 보호하는 모델 개발을 위한 실용적인 청사진을 제공합니다.

English

Foundation models have revolutionized fields such as natural language processing and computer vision by enabling general-purpose learning across diverse tasks and datasets. However, building analogous models for human mobility remains challenging due to the privacy-sensitive nature of mobility data and the resulting data silos across institutions. To bridge this gap, we propose MoveGCL, a scalable and privacy-preserving framework for training mobility foundation models via generative continual learning. Without sharing raw data, MoveGCL enables decentralized and progressive model evolution by replaying synthetic trajectories generated from a frozen teacher model, and reinforces knowledge retention through a tailored distillation strategy that mitigates catastrophic forgetting. To address the heterogeneity of mobility patterns, MoveGCL incorporates a Mixture-of-Experts Transformer with a mobility-aware expert routing mechanism, and employs a layer-wise progressive adaptation strategy to stabilize continual updates. Experiments on six real-world urban datasets demonstrate that MoveGCL achieves performance comparable to joint training and significantly outperforms federated learning baselines, while offering strong privacy protection. MoveGCL marks a crucial step toward unlocking foundation models for mobility, offering a practical blueprint for open, scalable, and privacy-preserving model development in the era of foundation models.

데이터 사일로 해체: 생성적 지속 학습을 통한 개방적이고 확장 가능한 모빌리티 기초 모델로의 여정

Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning

초록

Support