Microsoft Launches Three In-House AI Models, Signaling Shift From OpenAI Partnership

Key Points
- Microsoft released three in‑house AI models—MAI-Transcribe-1, MAI-Voice-1, MAI-Image-2—on its Foundry platform.
- MAI-Transcribe-1 achieves a 3.8% word‑error rate across 25 languages, beating OpenAI, Google and ElevenLabs on multiple benchmarks.
- MAI-Voice-1 generates 60 seconds of audio in under one second and supports custom voice creation from minimal samples.
- MAI-Image-2 placed third on the Arena.ai text‑to‑image leaderboard, behind only Google and OpenAI models.
- The models were built by a ten‑person team within Microsoft’s MAI Superintelligence unit, led by CEO Mustafa Suleyman.
- A September 2025 contract renegotiation with OpenAI gave Microsoft the freedom to develop competing models.
- Foundry now serves over 80,000 enterprises, including about 80% of Fortune 500 companies.
- OpenAI remains Microsoft’s largest AI partner, but both firms now compete on the same platform.
Six months after renegotiating its contract with OpenAI, Microsoft unveiled MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2 on its Foundry platform. The new models, built by a ten‑person team, boast lower error rates, faster speeds and competitive pricing, giving the tech giant a functional AI stack independent of its former partner. The rollout underscores Microsoft’s new freedom to pursue "humanist superintelligence" and could reshape enterprise AI spending.
Microsoft announced the public release of three home‑grown artificial‑intelligence models—MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2—on its Foundry platform, marking the first tangible output of the company’s MAI Superintelligence team. The models arrive just weeks after executive‑level changes that freed CEO Mustafa Suleyman from day‑to‑day product duties, allowing him to focus on building a suite of AI tools that operate entirely on Microsoft infrastructure.
MAI-Transcribe-1, a speech‑to‑text system, claims the lowest word‑error rate across 25 languages on the FLEURS benchmark, averaging 3.8 percent. Microsoft says it outperforms OpenAI’s Whisper‑large‑v3 on every language, beats Google’s Gemini 3.1 Flash on 22 of the 25 languages, and surpasses ElevenLabs’ Scribe v2 on 15. The model runs 2.5 times faster than the company’s previous Azure Fast transcription service and is priced at $0.36 per hour of audio. The development team behind it numbered just ten people.
MAI-Voice-1 completes the audio pipeline. The text‑to‑speech model can generate a minute of natural‑sounding audio in under a second on a single GPU and supports custom voice creation from only a few seconds of sample audio. When paired with MAI-Transcribe-1 and a customer‑chosen large language model, the duo offers a full voice solution that does not rely on any OpenAI technology.
The third offering, MAI-Image-2, entered the Arena.ai text‑to‑image leaderboard in March at number three, trailing only Google’s Gemini 3.1 Flash and OpenAI’s GPT Image 1.5. Developed with input from photographers, designers and visual storytellers, the model is already being used at scale by WPP, one of the world’s largest marketing groups.
The releases are more than technical milestones; they reflect a strategic pivot enabled by a September 2025 contract renegotiation with OpenAI. The new memorandum of understanding granted Microsoft licensing rights to all OpenAI outputs through 2032, secured $250 billion in additional Azure cloud commitments and, crucially, removed the clause that barred Microsoft from building its own general‑purpose AI models. Suleyman cited the renegotiation as the catalyst that allowed the company to pursue its "humanist superintelligence" agenda.
Microsoft’s Foundry platform—formerly Azure AI Foundry and Azure AI Studio—now serves over 80,000 enterprises, including roughly 80 percent of Fortune 500 companies. That distribution advantage means the MAI models do not need to dominate every benchmark to shift enterprise spending toward Microsoft‑built solutions. They simply have to be competitive enough for customers to choose an integrated option over third‑party alternatives.
OpenAI finds itself in a nuanced position. While Microsoft remains its largest investor and primary cloud provider, the two firms now share a platform that hosts both OpenAI and Microsoft models. OpenAI’s own fundraising round in February, which raised $110 billion and valued the company independently of Microsoft, suggests the partnership is evolving into a market where both parties compete side by side.
The broader AI landscape mirrors this fragmentation. Anthropic’s recent $30 billion raise and Google’s rapid Gemini iterations underline a market no longer dominated by a single frontier AI provider. Microsoft’s new model family adds a fourth heavyweight to the mix, giving enterprises more choices and signaling that the era of an exclusive OpenAI‑Microsoft AI pipeline is ending.
Suleyman cautions that the current models are foundational. He expects the superintelligence team to deliver frontier‑class language models within the next year or two, but for now the trio provides Microsoft with its own voice, ears and eyes—an independent AI stack that could reshape how businesses allocate AI spend.