Mustafa Suleiman has been preparing for his new job description for a long time. Suleiman was Microsoft’s first CEO of AI, but he took over some of the responsibilities after the company went through a major restructuring in mid-March and shifted his focus to the pursuit of superintelligence. Although the news was only announced last month, he says: The Vergehe had been preparing for the transition for nine months, but was officially “unlocked” by renegotiating OpenAI’s contract with Microsoft. [Microsoft’s] “The ability to pursue superintelligence,” he had planned before the ink was dry.
“This has been a long-standing plan,” he said, adding that achieving superintelligence “was purely my focus.”
Superintelligence, like AGI (artificial general intelligence), has a vague and changing definition in the AI industry. For Suleiman, it’s strictly about business and productivity. “Superintelligence is really about, ‘Can these models deliver production value to the millions of companies that rely on us to provide world-class language models?'” Suleiman says. “That’s our real focus. We want to deliver for developers, enterprises, and a huge number of consumers.” AI companies face increasing pressure to increase revenue, and Microsoft’s plans are also reflected in OpenAI’s new strategy.
Microsoft’s reorganization has combined its enterprise and consumer teams under the Copilot AI banner. While Suleiman will continue to focus on big-picture strategy, Jacob Andreou, previously corporate vice president of product and growth for Microsoft AI, has become executive vice president to lead the engineering, growth, product and design efforts of the newly combined team. This shift leaves Suleiman free to devote his time to pursuing superintelligence and developing Microsoft’s new frontier AI models at a time when competition among the big AI companies and the pressure to attract new paying consumers and business customers is more intense than ever.
On Thursday, Microsoft announced a new transcription model that does just that. According to Suleyman, this is “half the GPU cost of other cutting-edge models,” so it’s a “significant cost savings” for Microsoft.
The company touts MAI-Transcribe-1 as “pioneering the frontiers of speech recognition” with the ability to transcribe meetings in 25 languages, subtitle videos, and analyze call center interactions. According to a Microsoft blog post announcing the model, the model was built for “challenging” recording conditions such as background noise, low-quality audio, and overlapping audio, and was trained on a combination of “human-curated” and machine-transcribed transcripts. Suleiman said the source recordings were a combination of data from controlled sound booths and contractors tasked with recording themselves in ambient noise, from busy streets to children running around, as well as “a huge amount of data from the open web.”
In addition to the existing speech and image generation models MAI-Voice-1 and MAI-Image-2, a new transcription model is now available as part of Microsoft Foundry and the new Microsoft AI Playground. Microsoft says this is the first time these models will be “widely commercially available.” MAI-Transcribe-1 can process audio files in MP3, WAV, and FLAC formats.
Suleiman credits the new model’s performance in testing to a small, focused team of 10 people. He says the modeling team is “free from bureaucracy” because it surrounds itself with teams responsible for things like managing vendors and finding data to download. Microsoft has adopted a similar strategy for audio and image generation, and other companies are making similar moves. Meta, Amazon, and Google are experimenting with flattening their organizations, and Anthropic said it’s also experimenting with what can be accomplished by giving small teams of developers a certain level of computing freedom.
The new transcription model is part of Suleyman’s goal to deliver “human-centered” AI (a variation on Microsoft’s favorite AI buzzword, “humanist superintelligence”) that is useful to everyday people. “Everyone will have in their pocket an AI assistant that is truly world-class, accountable to them, aligned with their interests, and working on their behalf,” he said.
