Research · The Decoder ·
ByteDance study finds that asking LMMs questions beats making it transcribe text for long document training
ByteDance Seed reports that a 7B model can answer questions on long, image-heavy documents more reliably than larger models, even on documents four times longer than its training length. The study says training on question answering and passage finding works better than transcribing pages for long-document learning.