Research · Hacker News ·
Five frontier LLMs disagree on 67% of 1k real-world fact-check claims
A study evaluated five frontier large language models on 1,000 real-world fact-check claims, finding they disagreed on 67% of cases. The research highlights inconsistencies in model outputs and raises concerns about reliability for factual tasks.