Research · MarkTechPost ·
MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget
MiniMax released MSA, a two-branch block-sparse attention method built on grouped query attention. An index branch selects top-k key-value blocks per query and GQA group, while the main branch attends only to those blocks; the system matches GQA benchmarks and cuts per-token attention compute by 28.4x at 1M context.