Research · MarkTechPost ·
Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context
Nous Research introduced Lighthouse Attention, a training-only hierarchical attention method that is applied during pretraining and removed afterward. In tests on a 530M Llama-3-style model with 98K context, it reported 1.40–1.69× faster wall-clock training than a cuDNN SDPA baseline with similar or lower final loss.