AI Assistants Struggle with News Accuracy, Study Finds

Key Points
- International study analyzed over 3,000 AI‑generated news answers across 14 languages and 18 countries.
- 45% of responses contained significant problems; 31% had sourcing issues and 20% were inaccurate.
- Google Gemini performed worst, with errors in 76% of its answers.
- ChatGPT, Microsoft Copilot, and Perplexity also showed notable deficiencies.
- AI assistants often present concise, confident answers that lack transparent sourcing.
- EBU released a toolkit to improve news integrity in AI‑driven platforms.
- Study highlights the need for greater media literacy as AI use for news grows.
An international study led by the BBC and coordinated by the European Broadcasting Union examined how AI assistants handle news queries across 14 languages and 18 countries. The analysis of over 3,000 responses revealed that nearly half of the answers contained significant problems, with issues ranging from poor sourcing to outright inaccuracies. Google Gemini performed the worst, with errors in 76% of its replies, while other tools such as ChatGPT, Microsoft Copilot, and Perplexity also displayed notable shortcomings. The findings highlight persistent challenges in AI‑generated news content and underscore the need for greater media literacy and transparency.
Scope and Methodology of the Study
An extensive international investigation, spearheaded by the BBC and organized by the European Broadcasting Union (EBU), evaluated the performance of four major AI assistants—ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity—when answering news‑related questions. Researchers gathered more than 3,000 individual responses across 14 languages and 18 countries. Professional journalists from 22 public‑media outlets assessed each answer for accuracy, sourcing, and the ability to distinguish factual reporting from opinion.
Key Findings on Accuracy and Sourcing
The study uncovered that 45% of all AI‑generated answers exhibited a significant issue. Within this subset, 31% suffered from sourcing problems, and 20% were simply inaccurate. Errors ranged from misattributed quotes to hallucinated details that never appeared in the original reporting. The prevalence of these problems was consistent across languages, regions, and platforms, indicating systemic weaknesses rather than isolated incidents.
Performance Variations Among Assistants
Among the tools examined, Google Gemini fared the worst, with a staggering 76% of its responses containing major errors, primarily due to missing or poor sourcing. While other assistants also displayed deficiencies, none matched Gemini’s error rate. The findings suggest that the conversational format of AI assistants can give an illusion of authority, even when the underlying content is flawed.
Implications for Users and the Media Landscape
With an estimated 7% of online news consumers now turning to AI assistants for information—rising to 15% among those under 25—the study raises concerns about the reliability of AI‑mediated news delivery. Users often receive concise, confident answers that lack the transparency of traditional search results, making it harder to verify the information presented.
Calls for Greater Transparency and Media Literacy
The EBU and its partners responded by releasing a “News Integrity in AI Assistants Toolkit,” aimed at helping developers and journalists identify and address common failures in AI‑generated news. The toolkit emphasizes the importance of clear sourcing, contextual nuance, and user awareness of potential inaccuracies.
Looking Ahead
As AI developers continue to refine their models, the study underscores the necessity for ongoing scrutiny, improved accountability mechanisms, and heightened media literacy among the public. While AI assistants can streamline access to information, the findings remind users to approach AI‑generated news with caution and to verify claims through reliable sources.