ChatPaper.aiChatPaper

LFM2技术报告

LFM2 Technical Report

November 28, 2025
作者: Alexander Amini, Anna Banaszak, Harold Benoit, Arthur Böök, Tarek Dakhran, Song Duong, Alfred Eng, Fernando Fernandes, Marc Härkönen, Anne Harrington, Ramin Hasani, Saniya Karwa, Yuri Khrustalev, Maxime Labonne, Mathias Lechner, Valentine Lechner, Simon Lee, Zetian Li, Noel Loo, Jacob Marks, Edoardo Mosca, Samuel J. Paech, Paul Pak, Rom N. Parnichkun, Alex Quach, Ryan Rogers, Daniela Rus, Nayan Saxena, Bettina Schlager, Tim Seyde, Jimmy T. H. Smith, Aditya Tadimeti, Neehal Tumma
cs.AI

摘要

我们推出LFM2系列液态基础模型,该系列专为高效设备端部署与强大任务能力而设计。通过在边缘延迟和内存约束下进行硬件在环架构搜索,我们获得了结合门控短卷积与少量分组查询注意力模块的紧凑混合主干网络,在CPU上相比同规模模型实现最高2倍的预填充和解码速度。LFM2系列涵盖350M-8.3B参数规模,包括稠密模型(350M/700M/1.2B/2.6B)和专家混合变体(8.3B总量/1.5B激活参数),均支持32K上下文长度。其训练流程包含避免支持失配的温和解耦Top-K知识蒸馏目标、按难度排序数据的课程学习,以及监督微调-长度标准化偏好优化-模型融合的三阶段后训练方案。经过10-12T token预训练的LFM2模型在多项基准测试中表现优异:LFM2-2.6B在IFEval达到79.56%,GSM8K达到82.41%。我们还开发了多模态与检索变体:面向视觉语言任务的LFM2-VL、语音处理的LFM2-Audio以及检索专用的LFM2-ColBERT。LFM2-VL通过令牌高效的视觉处理支持可调节的精度-延迟权衡;LFM2-Audio采用音频输入输出路径分离设计,可实现与三倍规模模型相当的实时语音交互;LFM2-ColBERT提供低延迟查询文档编码器,支持跨语言高性能检索。所有模型均开源权重及ExecuTorch、llama.cpp和vLLM部署套件,使LFM2成为需要快速、内存高效推理与强大任务能力的边缘应用的实用基础平台。
English
We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a small number of grouped query attention blocks, delivering up to 2x faster prefill and decode on CPUs compared to similarly sized models. The LFM2 family covers 350M-8.3B parameters, including dense models (350M, 700M, 1.2B, 2.6B) and a mixture-of-experts variant (8.3B total, 1.5B active), all with 32K context length. LFM2's training pipeline includes a tempered, decoupled Top-K knowledge distillation objective that avoids support mismatch; curriculum learning with difficulty-ordered data; and a three-stage post-training recipe of supervised fine-tuning, length-normalized preference optimization, and model merging. Pre-trained on 10-12T tokens, LFM2 models achieve strong results across diverse benchmarks; for example, LFM2-2.6B reaches 79.56% on IFEval and 82.41% on GSM8K. We further build multimodal and retrieval variants: LFM2-VL for vision-language tasks, LFM2-Audio for speech, and LFM2-ColBERT for retrieval. LFM2-VL supports tunable accuracy-latency tradeoffs via token-efficient visual processing, while LFM2-Audio separates audio input and output pathways to enable real-time speech-to-speech interaction competitive with models 3x larger. LFM2-ColBERT provides a low-latency encoder for queries and documents, enabling high-performance retrieval across multiple languages. All models are released with open weights and deployment packages for ExecuTorch, llama.cpp, and vLLM, making LFM2 a practical base for edge applications that need fast, memory-efficient inference and strong task capabilities.
PDF281December 3, 2025