Research · MarkTechPost · 24 June 2026

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell

UC San Diego's DFlash replaces autoregressive drafting with a lightweight block diffusion model for speculative decoding, drafting whole token blocks in one forward pass via KV injection. The paper reports up to 6.08x lossless speedup on Qwen3-8B, while NVIDIA reports up to 15x throughput on Blackwell at fixed interact

Read the full story at MarkTechPost →