OpenAI Leverages Cerebras Wafer-Scale Chip to Boost Codex Speed

OpenAI Leverages Cerebras Wafer-Scale Chip to Boost Codex Speed
Ars Technica2

Key Points

  • OpenAI and Cerebras partner to run Codex‑Spark on the Wafer Scale Engine 3 chip.
  • Codex‑Spark achieves roughly 1,000 tokens per second, with higher rates on other models.
  • The collaboration reflects OpenAI’s effort to lessen reliance on Nvidia hardware.
  • OpenAI signed a multi‑year AMD deal in October 2025 and a $38 billion cloud pact with Amazon.
  • A planned $100 billion Nvidia infrastructure deal stalled; Nvidia later pledged $20 billion.
  • Speed is a critical factor for developers using AI coding assistants.
  • OpenAI’s recent releases include GPT‑5.2 and GPT‑5.3‑Codex after internal “code red” concerns.
  • Competition from Anthropic, Google and others intensifies the focus on latency.

OpenAI has teamed with Cerebras to run its Codex-Spark coding model on the Wafer Scale Engine 3, a chip the size of a dinner plate. The partnership aims to improve inference speed, delivering roughly 1,000 tokens per second, with higher rates reported on other models. The move reflects OpenAI’s broader strategy to reduce reliance on Nvidia by striking deals with AMD, Amazon and developing its own custom silicon. The faster coding assistant arrives amid fierce competition from Anthropic, Google and other AI firms, underscoring the importance of latency for developers building software.

Partnership and New Hardware

OpenAI announced a collaboration with Cerebras that brings its Codex‑Spark coding model to the Wafer Scale Engine 3. This processor, described as the size of a dinner plate, represents Cerebras’ core hardware offering and is the first product emerging from the partnership announced earlier this year.

Performance Benchmarks

Codex‑Spark delivers about 1,000 tokens per second, a speed that OpenAI calls modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on its own open‑weight gpt‑oss‑120B model, suggesting the lower figure reflects the larger or more complex nature of Codex‑Spark.

Why Speed Matters

AI‑driven coding assistants have experienced a breakout year, with tools such as OpenAI’s Codex and Anthropic’s Claude Code becoming increasingly useful for rapid prototyping, interface design and boilerplate generation. Faster inference translates directly into quicker developer iteration, turning a 1,000‑token‑per‑second experience into what developers describe as a “rip saw” versus a slower, more laborious process.

Competitive Landscape

The coding‑assistant market is crowded. OpenAI, Anthropic, Google and other firms are racing to ship more capable agents, and latency has become a key differentiator. OpenAI recently rolled out GPT‑5.3‑Codex after an internal “code red” memo highlighted competitive pressure from Google, following the earlier release of GPT‑5.2 in December.

Reducing Dependence on Nvidia

OpenAI has been systematically diversifying its hardware suppliers. The company signed a multi‑year deal with AMD in October 2025, entered a $38 billion cloud‑computing agreement with Amazon in November, and is designing its own custom AI chip for eventual fabrication by TSMC. A planned $100 billion infrastructure deal with Nvidia has stalled, though Nvidia later committed a $20 billion investment. Reuters reported that OpenAI grew unsatisfied with the speed of some Nvidia chips for inference tasks, a shortfall Codex‑Spark aims to address.

Implications for Developers

For developers spending hours inside a code editor waiting for AI suggestions, the speed gains offered by Codex‑Spark could meaningfully reduce friction. While the performance numbers are still modest compared with Cerebras’ top benchmarks, the partnership signals OpenAI’s commitment to delivering faster, more responsive coding tools as part of a broader hardware diversification strategy.

#OpenAI#Cerebras#Codex#Wafer Scale Engine#AI coding assistants#Nvidia#AMD#Amazon#Anthropic#Google#AI hardware
Generated with  News Factory -  Source: Ars Technica2

Also available in: