In recent years, the rapid advancement of large language models (LLMs) has transformed natural language processing, enabling machines to reason, generate, and interact with increasing sophistication. However, the traditional autoregressive paradigm that underpins most LLMs also brings significant limitations, including high computational cost, strict sequential generation, and challenges in training stability. To address these issues, the research community has been exploring diffusion-based approaches for text generation, leading to the emergence of diffusion language models.
dLLM (Diffusion Language Modeling) is an open-source library designed to unify the training, evaluation, and development of diffusion-based language models. Created by ZHZisZZ and contributors, dLLM aims to bring transparency, reproducibility, and flexibility to the diffusion modeling workflow. With scalable training pipelines, modular evaluation tools, and ready-made recipes for open-weight models like LLaDA, Dream, and BERT-Chat, dLLM is quickly becoming one of the most practical frameworks for researchers and practitioners working on next-generation text generation systems.
This blog provides a complete guide to the dLLM project, including its features, architecture, setup process, training and inference workflows, and real-world applications.
What is dLLM?
dLLM stands for Simple Diffusion Language Modeling, a framework designed to make diffusion-based text modeling accessible and scalable. Unlike conventional autoregressive LLMs, diffusion models generate text by iteratively denoising discrete tokens. This allows improved parallelism, flexibility in conditioning, and potentially better controllability in creative or structured generation tasks.
The dLLM framework unifies essential components of the modeling lifecycle:
- Training using scalable pipelines inspired by the Transformers Trainer
- Inference through unified generators abstracting away architecture-specific details
- Evaluation with a harness modeled after lm-evaluation-harness
- Extensible architecture supporting LoRA, DeepSpeed, FSDP, and more
dLLM also provides ready-to-use recipes for training and finetuning cutting-edge diffusion-based LLMs, including:
- LLaDA
- Dream
- EditFlow models
- ModernBERT Chat models
With an MIT license and an active repository, the project serves both academic research and practical experimentation.
Key Features of dLLM
1. Unified Training Pipelines
dLLM integrates a flexible, scalable training pipeline with support for:
- LoRA for parameter-efficient finetuning
- DeepSpeed for distributed optimization
- Fully Sharded Data Parallel (FSDP) training
- Quantization including 4-bit inference/training
These features enable researchers to train large-scale diffusion models on modest hardware or multi-node GPU clusters.
2. Modular Evaluation System
Inspired by the lm-evaluation-harness, dLLM provides a clean evaluation interface that abstracts away implementation details. Users can evaluate models on tasks including:
- MMLU Pro
- Few-shot benchmarks
- Custom datasets
This ensures reproducible and comparable results across experiments.
3. Extensible Pipelines for Multiple Architectures
The framework includes pipelines for multiple model families:
- LLaDA (pretraining, finetuning, evaluation)
- Dream
- BERT-Chat, demonstrating how masked models like BERT can become chatbots
- EditFlow, supporting insertion, deletion, and substitution operations
These examples serve as templates for building entirely new diffusion architectures.
4. Transparent and Reproducible Development
dLLM ensures transparency by providing:
- Structured file organization
- Dedicated data loaders
- Standardized generation utilities
- Configurable training scripts
Together, these enable predictable experimentation and debugging.
5. ModernBERT Instruction-Following Models
A major recent update includes ModernBERT-large-chat-v0 and ModernBERT-base-chat-v0. These show that BERT-like masked models can be adapted for generative chat tasks using masked instruction tuning. This discovery opens new research pathways for lightweight, efficient, and highly interpretable chat systems.
Repository Structure
The project is well-organized, making it easy to navigate and extend:
- core/
Contains generation utilities, schedulers, and training modules. - pipelines/
Application-specific pipelines for BERT, Dream, EditFlow, and LLaDA. - examples/
Ready-to-use scripts for pretraining, finetuning, chat demos, evaluation, and sampling. - scripts/
Slurm and accelerate configurations for multi-GPU and distributed training.
This modular architecture enables plug-and-play development, whether for research or production-level experimentation.
Installation and Setup
Setting up dLLM is straightforward:
- Create and activate a Python 3.10 environment
- Install PyTorch (CUDA 12.4 recommended)
- Install the dLLM package
- Optionally initialize the evaluation harness submodule
- Optionally configure Slurm for cluster training
The installation steps are designed to work across Linux-based systems with GPU acceleration.
Training with dLLM
Training workflows are designed to be simple yet powerful. A typical script includes:
- Loading model and tokenizer
- Loading datasets
- Initializing the MDLMTrainer
- Running training with accelerate or Slurm
Users can:
- Train with LoRA and 4-bit quantization
- Use subsets or concatenated datasets
- Choose distributed training methods such as DDP, ZeRO, or FSDP
These options allow efficient training even on limited hardware.
Inference with dLLM
Inference is equally streamlined. The framework provides unified generators so that researchers do not need to write architecture-specific inference code. By simply selecting the appropriate generator (e.g., LLaDAGenerator for LLaDA models), one can process chat messages, evaluate capabilities, or run interactive demos.
The chat scripts also support multi-turn conversations, making testing conversational performance straightforward.
Evaluation Tools
dLLM integrates directly with a modified evaluation harness. Researchers can:
- Run zero-shot and few-shot tests
- Benchmark across many tasks
- Measure performance on standardized datasets
dLLM also includes scripts to run evaluation on all major benchmarks with a single command, dramatically improving reproducibility.
Conclusion
dLLM represents a major step forward in diffusion-based language modeling by combining transparent engineering, unified pipelines, modern training recipes, and scalable infrastructure support. With its extensive documentation, sample scripts, and well-structured codebase, dLLM makes it easier than ever to explore diffusion LLMs, experiment with new architectures, and benchmark large models.
Whether you are researching discrete diffusion models, building new LLM architectures, or simply exploring how masked models like BERT can become chatbots, dLLM provides a comprehensive foundation. As diffusion models continue to gain momentum in NLP research, frameworks like dLLM will play a crucial role in scaling development and enabling reproducible, impactful experimentation.