Researchers Argue Bad Evaluation Incentives Drive AI Hallucinations

Are bad incentives to blame for AI hallucinations?
TechCrunch

Key Points

  • OpenAI paper examines why large language models still hallucinate.
  • Hallucinations are defined as plausible but false statements.
  • Pre‑training focuses on next‑word prediction without truth labels.
  • Low‑frequency facts are especially prone to errors.
  • Current evaluation rewards exact answers, encouraging guesses.
  • Proposed scoring penalizes confident mistakes and rewards uncertainty.
  • Negative scoring for wrong answers is suggested to deter guessing.
  • Redesigning incentives could reduce hallucinations in future AI.

A new paper from OpenAI examines why large language models such as GPT‑5 and ChatGPT continue to produce plausible but false statements, known as hallucinations. The authors explain that pretraining encourages models to predict the next word without distinguishing truth from falsehood, leading to errors on low‑frequency facts. They also argue that current evaluation methods reward correct answers regardless of confidence, prompting models to guess rather than express uncertainty. The paper proposes redesigning scoring systems to penalize confident mistakes, reward appropriate uncertainty, and discourage blind guessing, aiming to reduce hallucinations in future AI systems.

Background on AI Hallucinations

OpenAI has released a research paper that investigates the persistence of hallucinations—plausible yet false statements—generated by large language models like GPT‑5 and the ChatGPT chatbot. The paper defines hallucinations as statements that sound credible but are factually incorrect, and notes that despite advancements, these errors remain a fundamental challenge for all large language models.

Illustrative Errors

The researchers highlight concrete examples where the models were asked about the title of a specific researcher’s Ph.D. dissertation and the researcher’s birthday. In each case the model supplied three different answers, all of which were incorrect, underscoring the models’ tendency to fabricate details confidently.

Root Causes in Pre‑training

The authors attribute a key source of hallucinations to the pre‑training objective, which focuses solely on predicting the next word in a sequence. This objective lacks true‑or‑false labels, exposing the model only to positive examples of fluent language. While this approach captures common patterns like spelling and punctuation, it struggles with arbitrary low‑frequency facts that cannot be inferred from patterns alone, resulting in fabricated statements.

Evaluation Incentives and Model Behavior

Beyond the training phase, the paper argues that the way models are evaluated reinforces hallucinations. Current evaluation metrics reward models for achieving high accuracy on exact‑answer tests, encouraging them to guess when uncertain rather than admitting lack of knowledge. The researchers compare this to multiple‑choice exams where random guessing can yield a correct answer, while leaving a question blank guarantees zero points.

Proposed Changes to Scoring

To mitigate this issue, the paper suggests redesigning evaluation scoring to penalize confident errors more heavily than uncertain responses. It recommends offering partial credit for expressions of uncertainty and incorporating negative scoring for wrong answers, similar to standardized tests that discourage blind guessing. By aligning incentives with truthful reporting, models can be trained to prioritize accuracy over speculative confidence.

Implications for Future AI Development

The authors stress that modest additions of uncertainty‑aware tests are insufficient; the dominant accuracy‑based evaluations must be overhauled to change model behavior fundamentally. Implementing these incentive‑aligned metrics could lead to a reduction in hallucinations and improve the reliability of AI systems for real‑world applications.

#OpenAI#AI hallucinations#large language models#GPT-5#ChatGPT#model evaluation#machine learning incentives#pretraining#accuracy scoring#uncertainty handling
Generated with  News Factory -  Source: TechCrunch

Also available in: