Research · MarkTechPost ·
Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models
Nous Research released Token Superposition Training (TST), a two-phase pretraining method that averages contiguous token embeddings during Phase 1 and returns to standard next-token prediction in Phase 2. The company says it can cut wall-clock training time by up to 2.5x at matched FLOPs without changing model architec