Research · The Decoder · 24 May 2026

ByteDance study finds that asking LMMs questions beats making it transcribe text for long document training

ByteDance Seed reports that a 7B model can answer questions on long, image-heavy documents more reliably than larger models, even on documents four times longer than its training length. The study says training on question answering and passage finding works better than transcribing pages for long-document learning.

Read the full story at The Decoder →