Policy · The Decoder ·
Microsoft trained its MAI models on unlicensed web data despite promising "enterprise grade, clean and commercially licensed data"
Microsoft trained its MAI models partly on unlicensed web data, including Common Crawl, despite earlier claims that the models used only clean, commercially licensed data. The report says Microsoft relies on fair use and expects site owners to block its crawlers if they do not want their content used.