OpenAI Unveils GPT-5.4 with Pro and Thinking Variants

Key Points
- OpenAI released GPT-5.4 with three versions: standard, Pro (high performance), and Thinking (advanced reasoning).
- The model supports a context window of up to one million tokens, the largest offered by OpenAI.
- Token efficiency improvements enable tasks to be solved with fewer tokens than previous models.
- Benchmark scores hit record levels in computer‑use tests and achieved an 83% score on the GDPval knowledge‑work test.
- In professional benchmarks, GPT-5.4 led assessments of legal and financial tasks.
- Error rates dropped 33% for individual claims and 18% for overall responses versus GPT‑5.2.
- The Thinking variant showed reduced risk of deceptive chain‑of‑thought behavior in safety tests.
- A new Tool Search system reduces token overhead when accessing many tools in the API.
- OpenAI emphasizes the model’s suitability for high‑impact professional workloads.
OpenAI announced the release of GPT-5.4, its newest foundation model designed for professional workloads. The model is offered in three versions—a standard release, a high‑performance Pro edition, and a reasoning‑focused Thinking edition. GPT-5.4 features a context window of up to one million tokens and delivers significant token‑efficiency gains, allowing it to solve tasks with fewer tokens than prior models. Benchmark scores show record performance across computer‑use and knowledge‑work tests, while safety updates cut hallucinations by roughly one‑third. A new tool‑calling architecture called Tool Search reduces token overhead when accessing many tools, and a safety evaluation demonstrates lower risk of deceptive chain‑of‑thought behavior in the Thinking version.
New Model Family
OpenAI introduced GPT-5.4 as its most capable and efficient frontier model for professional work. The offering includes three distinct versions: the standard GPT-5.4, GPT-5.4 Pro, which is optimized for high performance, and GPT-5.4 Thinking, tailored for advanced reasoning tasks. All three share a dramatically enlarged context window that can handle up to one million tokens, providing the largest token capacity currently available from OpenAI.
Token Efficiency and Performance Gains
OpenAI highlighted that GPT-5.4 can solve the same problems using significantly fewer tokens than its predecessor. This token‑efficiency improvement translates into faster, cheaper processing for complex applications. Benchmark testing shows record scores in computer‑use evaluations such as OSWorld‑Verified and WebArena Verified, and the model achieved an 83% result on OpenAI’s GDPval test for knowledge‑work tasks. In professional benchmarks like Mercor’s APEX‑Agents, which assess legal and financial skill sets, GPT-5.4 led the rankings, demonstrating strong ability to generate long‑horizon deliverables such as slide decks, financial models, and legal analysis.
Reduced Hallucinations and Safer Output
Continuing its focus on reliability, OpenAI reported that GPT-5.4 is 33% less likely to make errors in individual claims compared with GPT‑5.2, and overall responses are 18% less likely to contain errors. A new safety evaluation targeting chain‑of‑thought behavior showed that the Thinking version is less prone to deceptive reasoning, suggesting that the model lacks the ability to hide its thought process and that monitoring remains an effective safety tool.
Tool Search: A New Approach to Tool Calling
The API version of GPT-5.4 introduces a system called Tool Search, which changes how the model accesses tool definitions. Previously, system prompts had to list all available tools, consuming many tokens as the toolset grew. Tool Search allows the model to look up definitions only when needed, reducing token usage and lowering request costs in environments with many tools.
Implications for Professional AI Use
By combining a massive context window, superior token efficiency, record benchmark performance, and stronger safety mechanisms, GPT-5.4 positions itself as a versatile engine for a wide range of professional applications. The Pro and Thinking variants give developers the flexibility to prioritize speed or deep reasoning, while the new Tool Search architecture streamlines integration with complex tool ecosystems. OpenAI’s announcements signal a continued push toward more capable, cost‑effective, and trustworthy AI systems for enterprise and research use.