Moebius:0.2B轻量级图像修复框架,具备10B级性能

Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance

June 17, 2026
作者: Kangsheng Duan, Ziyang Xu, Wenyu Liu, Xiaohu Ruan, Xiaoxin Chen, Xinggang Wang
cs.AI

摘要

尽管10B级别的工业基础模型在图像修复领域取得了突破性进展,但其高昂的计算成本严重阻碍了实际部署。构建高度优化的任务专用模型是一种有前景的解决方案,然而极端结构压缩不可避免地会引发严重的表征瓶颈。为攻克这一难题,我们提出了Moebius——一个高效轻量化的修复框架。我们通过引入局部-λ混合交互(LλMI)模块,系统地重构了扩散模型的主干网络。该模块由Local-λ和Interactive-λ子模块组成,能将空间上下文和全局语义先验优雅地归纳为固定大小的线性矩阵,在大幅削减参数的同时保留复杂的潜在交互。此外,为充分释放这一高度紧凑架构的表征能力,我们将其与自适应多粒度蒸馏策略协同配对。该策略严格在潜在空间内运行(避免昂贵的像素级解码),通过动态平衡多种基于梯度的损失来实现高保真对齐。在自然图像和人像基准上的大量实验表明,这种最优协同使Moebius能够媲美甚至超越10B级别工业通用模型FLUX.1-Fill-Dev的生成质量。值得注意的是,Moebius仅使用前者不到2%的参数(0.22B对比11.9B),同时实现总推理时间超过15倍的加速,为高保真修复设立了新的效率标准。项目主页:https://hustvl.github.io/Moebius。
English
While 10B-level industrial foundation models have pushed the boundaries of image inpainting, their prohibitive computational costs severely hinder practical deployment. Constructing a highly optimized task-specific specialist offers a promising solution; however, extreme structural compression inevitably triggers a severe representation bottleneck. To conquer this, we propose Moebius, a highly efficient lightweight inpainting framework. We systematically reconstruct the diffusion backbone by introducing the Local-λ Mix Interaction (LλMI) block. Comprising Local-λ and Interactive-λ modules, it elegantly summarizes spatial contexts and global semantic priors into fixed-size linear matrices, preserving complex latent interactions while drastically shedding parameters. Furthermore, to unlock the full representational capacity of this highly compact architecture, we synergistically pair it with an adaptive multi-granularity distillation strategy. Operating strictly within the latent space to avoid expensive pixel-space decoding, this strategy dynamically balances multiple gradient-based losses to achieve high-fidelity alignment. Extensive experiments across natural and portrait benchmarks demonstrate that this optimal synergy enables Moebius to rival or even surpass the generation quality of the 10B-level industrial generalist FLUX.1-Fill-Dev. Remarkably, Moebius achieves this using less than 2\% of the parameters (0.22B vs. 11.9B) while delivering a >15times acceleration in total inference time, setting a new efficiency standard for high-fidelity inpainting. Project page at https://hustvl.github.io/Moebius.
PDF1174June 22, 2026