DARE: 확산 대규모 언어 모델 정렬 및 강화 실행기

초록

확산 대형 언어 모델(dLLM)은 엄격하게 순차적인 토큰 생성을 반복적 노이즈 제거 및 병렬 생성 역학으로 대체하며, 기존의 지배적인 자기회귀 모델에 대한 매력적인 대안으로 부상하고 있습니다. 그러나 이들의 오픈소스 생태계는 모델 패밀리 간, 특히 사후 학습 파이프라인 전반에 걸쳐 여전히 파편화되어 있습니다. 강화 학습 목적 함수, 롤아웃 구현 및 평가 스크립트가 논문별 코드베이스로 공개되는 경우가 많기 때문입니다. 이러한 파편화는 연구 반복을 늦추고 재현의 엔지니어링 부담을 가중시키며, 알고리즘 간 공정한 비교를 어렵게 만듭니다. 본 논문에서는 dLLM의 사후 학습 및 평가를 위한 오픈 프레임워크인 DARE(dLLM Alignment and Reinforcement Executor)를 제시합니다. verl~sheng2024hybridflow와 OpenCompass~2023opencompass를 기반으로 구축된 DARE는 마스크 확산 언어 모델과 블록 확산 언어 모델 모두에 대해 공유 실행 스택 하에 지도 미세 조정, 매개변수 효율적 미세 조정, 선호도 최적화, 그리고 dLLM 특화 강화 학습을 통합합니다. LLaDA, Dream, SDAR, LLaDA2.x 등 대표적인 모델 패밀리 전반에 걸쳐 DARE는 광범위한 알고리즘 지원, 재현 가능한 벤치마크 평가, 실용적인 가속화를 제공합니다. 광범위한 실증 결과는 DARE가 현재 및 차세대 dLLM용 사후 학습 방법을 개발, 비교, 배포하기 위한 재사용 가능한 연구 기반으로 기능함을 입증합니다.

English

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model families and, in particular, across post-training pipelines, where reinforcement learning objectives, rollout implementations and evaluation scripts are often released as paper-specific codebases. This fragmentation slows research iteration, raises the engineering burden of reproduction, and makes fair comparison across algorithms difficult. We present DARE (dLLMs Alignment and Reinforcement Executor), an open framework for post-training and evaluating dLLMs. Built on top of verl~sheng2024hybridflow and OpenCompass~2023opencompass, DARE unifies supervised fine-tuning, parameter-efficient fine-tuning, preference optimization, and dLLM-specific reinforcement learning under a shared execution stack for both masked and block diffusion language models. Across representative model families including LLaDA, Dream, SDAR, and LLaDA2.x, DARE provides broad algorithmic coverage, reproducible benchmark evaluation, and practical acceleration. Extensive empirical results position that DARE serves as a reusable research substrate for developing, comparing, and deploying post-training methods for current and emerging dLLMs.

DARE: 확산 대규모 언어 모델 정렬 및 강화 실행기

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

초록

Support