OpenAI Unveils GPT-5.4 with Native Computer Use and Expanded Context Window

Key Points
- OpenAI released GPT-5.4 in three configurations: standard, Thinking, and Pro.
- Benchmark tests show the model matching or exceeding professionals in many tasks.
- Native computer use enables the model to operate software and perform multi‑step workflows.
- Context window expanded to a 1‑million‑token limit for full‑document processing.
- New tool‑search system reduces token usage by retrieving tool definitions on demand.
- Safety evaluation (CoT Controllability) indicates low ability to hide reasoning.
- Launch occurs during heightened competition among frontier AI models.
OpenAI released GPT-5.4, a new frontier model offered in three configurations for general, reasoning-intensive, and high‑demand workloads. The model shows benchmark gains across professional tasks, introduces native computer use, and expands the context window to a 1‑million‑token limit. A redesigned tool‑search system reduces token usage, and a new safety evaluation tests chain‑of‑thought controllability. The launch positions GPT-5.4 as OpenAI’s most capable model for professional work while highlighting ongoing competition in the AI frontier.
Model Launch and Configurations
OpenAI announced GPT-5.4, describing it as the company’s most capable and efficient frontier model for professional work. The model is available in three versions: a standard release for general use, a "Thinking" variant designed for tasks that benefit from extended chain‑of‑thought reasoning, and a "Pro" version aimed at the highest‑demand workloads. The "Thinking" option is accessible to Plus, Team, and Pro subscribers, while the "Pro" tier is reserved for higher‑priced ChatGPT plans.
Benchmark Performance
According to OpenAI’s internal evaluations, GPT-5.4 matches or exceeds industry professionals in a majority of professional task comparisons, improving on previous versions. On a desktop‑navigation benchmark, the model achieved a success rate that surpasses the reported human performance benchmark. It also topped a professional‑task benchmark that assesses sustained workflows across fields such as investment banking and corporate law. OpenAI reports reductions in factual errors and hallucinations compared with earlier releases.
New Capabilities
The most significant addition is native computer use, allowing the model to operate software, navigate file systems, and execute multi‑step workflows without external agentic frameworks. This capability is built into the general‑purpose model, simplifying integration for developers. The API also supports a context window up to 1‑million tokens, more than double the previous limit, enabling full‑context processing of large documents, codebases, and financial records.
Efficiency Improvements
A redesigned tool‑search system lets the model retrieve tool definitions on demand, cutting token usage by nearly half in internal tests. This reduction translates to lower costs and faster responses for large‑scale agentic systems.
Safety Evaluation
OpenAI introduced an open‑source evaluation called CoT Controllability, which tests whether the model can deliberately obscure its reasoning to evade monitoring. The results suggest the model shows low ability to hide its chain‑of‑thought, which OpenAI frames as a positive safety signal.
Competitive Landscape
The release arrives amid intense competition from other frontier AI models, each leading in different benchmark categories. While GPT-5.4 leads on desktop computer use and professional knowledge‑work tasks, other models excel in coding or abstract reasoning. OpenAI’s rapid release cadence underscores its strategy of staying visible in a fast‑moving market.