OpenAI Evaluates GPT‑5 Models for Political Bias

OpenAI released details of an internal stress‑test aimed at measuring political bias in its chatbot models. The test, conducted on 100 topics with prompts ranging from liberal to conservative and charged to neutral, compared four models—including the newer GPT‑5 instant and GPT‑5 thinking—to earlier versions such as GPT‑4o and OpenAI o3. Results show the GPT‑5 models reduced bias scores by about 30 percent and handled charged prompts with greater objectivity, though moderate bias still appears in some liberal‑charged queries. The company says bias now occurs infrequently and at low severity, while noting ongoing political pressures on AI developers.

Background

OpenAI announced a new internal assessment designed to gauge the political neutrality of its ChatGPT models. The effort follows months of development and a broader campaign to address complaints that earlier versions exhibited partisan slant, particularly from conservative observers.

Testing Methodology

The company constructed a set of 100 topics—such as immigration and pregnancy—drawn from party agendas and culturally salient issues. Each topic was presented to the chatbot in five distinct ways, ranging from liberal to conservative and from charged to neutral. The test was run across four models: the older GPT‑4o and OpenAI o3, and the newer GPT‑5 instant and GPT‑5 thinking.

To evaluate responses, a separate large‑language model applied a rubric that flags rhetorical techniques OpenAI deems biased. Criteria include placing user phrasing in “scare quotes” (user invalidation), using language that escalates a political stance, presenting the bot’s own viewpoint, offering only one side of an issue, or refusing to engage.

Key Findings

OpenAI reports that bias now appears “infrequently and at low severity.” Moderate bias shows up mainly in charged prompts, especially those with a liberal slant. The company notes that “strongly charged liberal prompts exert the largest pull on objectivity across model families, more so than charged conservative prompts.”

When comparing models, the GPT‑5 instant and GPT‑5 thinking versions performed better than GPT‑4o and OpenAI o3. The newer models achieved roughly a 30 percent lower bias score overall and showed improved resistance to pressure from charged prompts. When bias did emerge, it typically manifested as personal opinion, emotional escalation, or emphasis on a single side of an issue.

Context and Implications

OpenAI has previously offered users the ability to adjust the tone of ChatGPT and published a “model spec” outlining intended behaviors. The current test follows political scrutiny from the Trump administration, which issued an executive order urging agencies to avoid “woke” AI models and pressing AI firms to make their systems more conservative‑friendly. OpenAI’s topic categories include “culture & identity” and “rights & issues,” areas highlighted in the administration’s concerns.

While the new GPT‑5 models demonstrate measurable progress toward political neutrality, OpenAI acknowledges that completely eliminating bias remains a challenge. The company’s ongoing testing framework aims to keep bias low as the technology evolves.

OpenAI Evaluates GPT‑5 Models for Political Bias

Key Points

Background

Testing Methodology

Key Findings

Context and Implications

Also available in: