Study Finds 35% of New Websites Use AI, Driving an Overly Cheerful Tone Online

Study Finds 35% of New Websites Use AI, Driving an Overly Cheerful Tone Online
Wired AI

Key Points

  • 35% of websites launched between 2022‑2025 rely on AI‑generated or AI‑assisted content.
  • AI‑written pages exhibit a 107% higher positive sentiment score than human‑written sites.
  • Semantic analysis shows AI content is 33% more similar to each other, reducing ideological diversity.
  • The study found no evidence of increased misinformation linked to AI‑generated sites.
  • AI‑generated pages link to external sources at rates comparable to human‑written pages.
  • Researchers expected a generic writing style but did not find significant flattening.
  • Public poll respondents anticipated more fake news and less linking, contrary to findings.
  • The work uses Wayback Machine snapshots and Pangram Labs detection tools.

A preprint study released by researchers from Imperial College London, Stanford University and the Internet Archive reveals that roughly 35 percent of websites launched between 2022 and 2025 rely on AI-generated or AI-assisted content. The analysis shows that AI‑written pages carry a markedly higher positive sentiment, making the web feel artificially upbeat. The same work finds that AI content reduces ideological diversity, while several expected side effects—such as a rise in misinformation or a drop in external linking—did not materialize. The findings challenge common assumptions about the impact of large language models on online discourse.

Researchers from Imperial College London, Stanford University and the Internet Archive have published a preprint that quantifies the spread of artificial‑intelligence‑generated content across the public web. By sampling snapshots from the Wayback Machine, the team identified that about 35 percent of sites created from 2022 through 2025 were either fully AI‑generated or heavily assisted by large language models.

To reach that figure, the investigators tested four detection approaches before settling on a tool from Pangram Labs, which delivered the most consistent results despite acknowledged imperfections. The sample, drawn from the Internet Archive’s massive repository, was intended to represent the broader ecosystem of new web pages.

The study’s most striking headline is the surge in positive language. Sentiment analysis shows that AI‑written pages score roughly 107 percent higher on positive sentiment than their human‑crafted counterparts. The researchers describe the effect as an "artificial cheerfulness" that makes the overall tenor of online writing feel saccharine.

Beyond tone, the analysis suggests a narrowing of viewpoints. Using semantic similarity metrics, the team found that AI‑driven sites are about 33 percent more alike in content than human‑written sites, indicating a shrinkage in ideological diversity across the web.

Four hypotheses the researchers entered the study with proved false. Contrary to popular belief, the data did not show a spike in misinformation linked to AI content. Likewise, AI‑generated pages were just as likely to include outbound links as human‑written ones, and the writing style did not flatten into a generic voice.

"Everyone on the team expected that to be true," said Stanford researcher Maty Bohacek, noting the surprise at the lack of evidence for a stylistic homogenization. "But we just don’t have significant evidence for that." The unexpected findings highlight how assumptions about large language models can outpace empirical reality.

Before the technical work began, the team commissioned a public poll on attitudes toward AI‑written content. Respondents largely anticipated a rise in fake news, a decline in external linking and a uniform, bland writing style—outcomes that the study ultimately did not confirm. The mismatch between perception and measurement underscores a broader gap in public understanding of AI’s real‑world effects.

The authors stress that this research is an early step, not a final verdict, on how AI reshapes the internet. They hope the data will spur deeper investigations into the nuanced ways large language models influence both the tone and the diversity of online discourse.

#AI-generated content#large language models#internet research#sentiment analysis#semantic similarity#misinformation#Wayback Machine#Stanford University#Imperial College London#Internet Archive
Generated with  News Factory -  Source: Wired AI

Also available in: