Polite Replies Signal AI Bots, Study Shows

Key Points
- Study conducted by researchers from Zurich, Amsterdam, Duke and NYU.
- Introduced a computational Turing test using automated classifiers.
- Evaluated nine open‑weight large language models across multiple platforms.
- Classifiers detected AI‑generated replies with 70‑80 percent accuracy.
- Overly polite, friendly tone emerged as the most reliable AI indicator.
- AI replies showed consistently lower toxicity scores than human comments.
- Optimization methods reduced some differences but not emotional tone.
- Findings highlight the usefulness of affective cues for AI detection.
Researchers from the University of Zurich, University of Amsterdam, Duke University and NYU released a study revealing that AI‑generated social‑media replies are often marked by an overly friendly emotional tone. Testing nine open‑weight large language models across platforms such as Twitter/X, Bluesky and Reddit, the team found their classifiers could detect AI‑generated content with 70 to 80 percent accuracy. The models also displayed consistently lower toxicity scores than human users, and attempts to fine‑tune or prompt them did not eliminate the emotional‑tone giveaway.
University Collaboration Uncovers AI Tell‑Tale
Researchers from four institutions – the University of Zurich, the University of Amsterdam, Duke University and New York University – conducted a systematic analysis of large language models (LLMs) operating on popular social‑media platforms. Their goal was to determine how closely AI‑generated replies resemble authentic human comments and to identify reliable markers that distinguish the two.
Computational Turing Test Framework
The team introduced a “computational Turing test,” an automated classification system that replaces subjective human judgment with objective linguistic analysis. By feeding real‑world posts from Twitter/X, Bluesky and Reddit to nine open‑weight models, the researchers generated reply texts and then evaluated them using their classifiers.
Models Evaluated and Accuracy Results
The study examined a diverse set of models, including Llama 3.1 (8B, 8B Instruct, 70B), Mistral 7B (v0.1, Instruct v0.2), Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek‑R1‑Distill‑Llama‑8B and Apertus‑8B‑2509. Across all platforms, the classifiers identified AI‑generated replies with an accuracy ranging from 70 percent to 80 percent.
Emotional Tone as a Persistent Indicator
Analysis revealed that the most consistent differentiator was affective tone. AI outputs tended to be overly polite, friendly and emotionally restrained, contrasting sharply with the casual negativity and spontaneous emotional expression typical of human users. This “politeness” signal persisted even after the researchers applied various optimization strategies, such as providing writing examples, fine‑tuning, or contextual retrieval.
Lower Toxicity Scores in AI Replies
In addition to tone, the study measured toxicity—a metric of hostile or harmful language. AI‑generated replies consistently scored lower on toxicity than authentic human comments, indicating a reluctance of current models to produce the more abrasive language often found in everyday social‑media discourse.
Optimization Attempts and Limits
The research team experimented with several calibration techniques aimed at reducing structural differences like sentence length or word count. While these adjustments narrowed some gaps, the emotional‑tone disparity remained robust. The authors concluded that simply making models larger or more finely tuned does not automatically yield human‑like emotional expression.
Implications for Detection and Trust
These findings suggest that platforms and users can rely on affective cues—especially an unusually polite or friendly tone—to flag potential AI‑generated content. The study challenges the assumption that advanced optimization will erase all detectable signatures of machine‑authored text, underscoring the need for continued development of detection tools.