巴黎:一种去中心化训练的开源权重扩散模型
Paris: A Decentralized Trained Open-Weight Diffusion Model
October 3, 2025
作者: Zhiying Jiang, Raihan Seraj, Marcos Villagra, Bidhan Roy
cs.AI
摘要
我们推出Paris,这是首个完全通过去中心化计算预训练并公开发布的扩散模型。Paris证明了高质量文本到图像生成无需中心化协调的基础设施即可实现。Paris开放供研究和商业使用。Paris的开发需要我们从头实现分布式扩散训练框架。该模型由8个专家扩散模型组成(每个模型参数规模在1.29亿至6.05亿之间),这些模型在完全隔离的环境中训练,无需梯度、参数或中间激活的同步。不同于要求数千个GPU间同步梯度更新,我们将数据划分为语义连贯的集群,每个专家独立优化其子集,同时共同逼近完整分布。一个轻量级Transformer路由器在推理时动态选择适当的专家,实现了与中心化协调基线相当的生成质量。消除同步需求使得训练能在异构硬件上进行,无需专用互连。实证验证表明,Paris的去中心化训练在保持生成质量的同时,消除了大规模扩散模型对专用GPU集群的需求。Paris仅使用了之前去中心化基线14分之一的训练数据和16分之一的计算资源,便达成了这一成就。
English
We present Paris, the first publicly released diffusion model pre-trained
entirely through decentralized computation. Paris demonstrates that
high-quality text-to-image generation can be achieved without centrally
coordinated infrastructure. Paris is open for research and commercial use.
Paris required implementing our Distributed Diffusion Training framework from
scratch. The model consists of 8 expert diffusion models (129M-605M parameters
each) trained in complete isolation with no gradient, parameter, or
intermediate activation synchronization. Rather than requiring synchronized
gradient updates across thousands of GPUs, we partition data into semantically
coherent clusters where each expert independently optimizes its subset while
collectively approximating the full distribution. A lightweight transformer
router dynamically selects appropriate experts at inference, achieving
generation quality comparable to centrally coordinated baselines. Eliminating
synchronization enables training on heterogeneous hardware without specialized
interconnects. Empirical validation confirms that Paris's decentralized
training maintains generation quality while removing the dedicated GPU cluster
requirement for large-scale diffusion models. Paris achieves this using
14times less training data and 16times less compute than the prior
decentralized baseline.