연속적 잡음 제거를 통한 일단계 언어 모델링

초록

이산 확산 기반 언어 모델은 자기회귀 모델보다 빠른 생성을 제공할 수 있는 잠재력으로 인해 폭넓은 관심을 받아왔습니다. 그러나 실제로는 적은 스텝 수에서 샘플 품질이 급격히 저하되어 이러한 가능성을 실현하지 못하고 있습니다. 본 연구에서는 흐름 기반 연속 잡음 제거를 활용한 언어 모델이 품질과 속도 모두에서 이산 확산 모델을 능가할 수 있음을 보여줍니다. 이산 모달리티에 대한 흐름의 기본 원리를 재검토하여 원-핫 토큰 인코딩에 대해 유클리드 잡음 제거를 수행하는 흐름 기반 언어 모델(FLM)을 구축했습니다. 이 모델은 교차 엔트로피 목적 함수를 통해 원본 데이터를 예측하는 방식으로 훈련될 수 있으며, 훈련 안정성과 생성 품질을 크게 향상시키는 간단한 시간 재매개변수화를 도입했습니다. FLM을 해당 흐름 맵으로 지식 증류함으로써 적은 스텝 수 생성이 가능한 증류 흐름 맵 언어 모델(FMLM)을 얻었습니다. LM1B 및 OWT 언어 데이터셋에서 FLM은 최첨단 이산 확산 모델에 버금가는 생성 품질을 달성했습니다. FMLM을 사용한 우리의 접근법은 최신 적은 스텝 언어 모델들을 전반적으로 능가하며, 1-스텝 생성으로 해당 모델들의 8-스텝 품질을 뛰어넘었습니다. 본 연구는 이산 모달리티에 대한 생성 모델링에 이산 확산 과정이 필요하다는 널리 퍼진 가설에 의문을 제기하며, 대규모 흐름 기반 언어 모델링의 가속화를 위한 길을 열어줍니다. 코드는 https://github.com/david3684/flm에서 확인할 수 있습니다.

English

Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. In practice, however, they exhibit a sharp degradation of sample quality in the few-step regime, failing to realize this promise. Here we show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build a flow-based language model (FLM) that performs Euclidean denoising over one-hot token encodings. We show that the model can be trained by predicting the clean data via a cross entropy objective, where we introduce a simple time reparameterization that greatly improves training stability and generation quality. By distilling FLM into its associated flow map, we obtain a distilled flow map language model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality matching state-of-the-art discrete diffusion models. With FMLM, our approach outperforms recent few-step language models across the board, with one-step generation exceeding their 8-step quality. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale. Code is available at https://github.com/david3684/flm.

연속적 잡음 제거를 통한 일단계 언어 모델링

One-step Language Modeling via Continuous Denoising

초록

Support