Microsoft taps OpenAI for new AI models for voice, images • The Register

Machine Learning


Microsoft on Thursday announced public previews of three homegrown machine learning models focused on speech recognition, speech synthesis, and image generation.

This release makes Windows Business look more like a direct competitor to OpenAI than an investor. Redmond held approximately $135 billion worth of OpenAI stock as of October last year.

This model includes: MAI-Transcribe-1 is a speech recognition model that delivers “enterprise-grade accuracy across 25 languages ​​at approximately 50% lower GPU cost than leading alternatives.” MAI-Voice-1 is a voice generation model that can generate 60 seconds of audio in less than 1 second on a single GPU. MAI-Image-2 is a text-to-image model that adds to the despair of digital artists.

OpenAI happens to offer its own speech recognition, speech generation, and text-to-image models.

Microsoft models are available through Foundry (formerly Azure AI Studio), a platform for developing AI agents and applications.

Naomi Moneypenny, who leads the Microsoft Azure AI Foundry Models product team, talked about the model’s arrival in a blog post.

“These are the same models already in our products like Copilot, Bing, PowerPoint, and Azure Speech, and are now available exclusively on Foundry for developers to use,” she wrote.

This model seems suitable for common enterprise use cases, such as designing a customer support agent that can recognize speech and generate responses. Moneypenny suggests the model could also be useful for providing captions for large events and conferences, subtitling and archiving media, education and training, gathering customer and market insights from focus groups, and more.

Microsoft already consumes its own dog food here. Copilot’s Audio Expressions runs on MAI-Voice-1, while Copilot’s Voice Mode transcription service uses MAI-Transcribe-1.

Developers can experiment with these two models via Azure Speech.

When Microsoft announced that it had renegotiated its contract with OpenAI, the Windows industry indicated that the partnership would last until at least 2032, a scenario that assumes there is no collapse of the AI ​​market. But it also highlighted a competitive field. “Microsoft can now pursue AGI on its own.” [artificial general intelligence] alone or in partnership with third parties,” the company said at the time. This statement itself leaves Microsoft free to go its own way with AI under the guise of AGI research.

Microsoft has an incentive to avoid risk. The company’s OpenAI relationship became strained in January when Microsoft investors complained that the company was spending too much on OpenAI. The leader of the AI ​​hype is burning cash and is expected to lose $14 billion this year, according to internal forecasts published by The Information. Internal efforts are reportedly underway to streamline its focus on enterprise customers, and late last month it shut down Sora 2, a video generator that burns tokens but isn’t particularly useful.

Two weeks ago, Microsoft CEO Satya Nadella announced leadership changes that will impact the company’s Copilot product and superintelligence efforts. Jacob Andreou has been selected to serve as Microsoft’s EVP across consumer and commercial products, leading the company’s Copilot experience and reporting directly to Nadella. Copilot is currently focused on four areas: Copilot Experiences, Copilot Platform, Microsoft 365 Apps, and AI Models.

Andreou’s AI modeler probably doesn’t just check in with OpenAI to see what models are available. And if Microsoft’s model ambitions are clear enough, Mustafa Suleiman will continue to steer Microsoft’s AI research, Nadella said, but there’s no need to if your ambitions are to continue relying on OpenAI. ®



Source link