Research · MarkTechPost ·
NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
NVIDIA introduced Gated DeltaNet-2, a linear attention layer that separates memory erasing and writing with different channel-wise gates. Trained at 1.3B parameters on 100B FineWeb-Edu tokens, it reportedly beats Mamba-2, Gated DeltaNet, KDA, and Mamba-3 on language modeling, reasoning, and long-context retrieval bench