OctoML Debuts Self-Optimizing Computing Service for Generative AI Applications

Applications of AI


Artificial intelligence optimization startup OctoML Inc. is switching gears today with the launch of the industry’s first self-optimizing computing service for AI models.

The new service, called OctoAI, is a new foundational infrastructure for developers looking to build and extend AI applications in their preferred model, including open-source and custom-built models, the company said. It is fully managed and gives developers easy access to the cost-effective and scalable accelerated computing infrastructure they need to create, customize and run AI models for their most specialized applications. the company said.

The launch of OctoAI marks a small stepping stone for OctoML, which first launched in 2019 with an AI optimization platform based on the open source Apache TVM framework. OctoML was intended to help developers improve the performance of their models, but as companies race to take advantage of the latest developments in generative AI, OctoML has expanded its focus to running AI applications. increase.

Alongside OctoAI, the company offers a library containing the world’s fastest and most affordable generative AI models accelerated by an optimization platform. It includes model templates such as Stable Diffusion 2.1, Dolly v2, Llama 65B, Whisper, FlanUL, Vicuna, and more.

OctoML CEO Luis Ceze described the company’s transition, saying that efficient computing is essential to enable generative AI applications.

“Companies are eager to build AI-powered solutions, but the process of getting models from development to production is highly complex and often requires expensive specialist talent and infrastructure.” said he. “OctoAI is about making models work for the business, not the other way around. We remove all the complexity so developers can focus on building great applications instead of worrying about managing infrastructure. will do so.”

With OctoAI, companies can take a desired model template or design their own, fine-tune it to meet very specific requirements, and integrate the completed model into their application development workflow. . Customers can then balance costs by choosing different hardware options for running their models, with a clear understanding of the price/performance trade-offs.

“It gives users freedom because they can choose a model or bring their own custom model,” Ceze told SiliconANGLE in an interview with theCUBE (below). “Second, we optimize the model, choose the right hardware, make sure you get the right trade-off between performance and efficiency, so you get more efficiency. We offer a collection of ultra-optimized models.”

Cost-effective inference

OctoML exists in the highly competitive machine learning deployment platform space, and this announcement will help differentiate it by giving users a way to optimize their AI models for cost-effective inference. said Andy Thurai, Vice President and Principal Analyst at Constellation Research Inc.

Thurai explained that while the excessive cost of training AI models has received a lot of attention, few people talk about the cost of inference, which is essentially the cost of keeping an AI model running in production. bottom. According to Thurai, inference costs are often orders of magnitude higher than his AI training costs, especially when an application reaches millions of users.

“Scaling up AI operations at these costs would be very inefficient,” says Turai. “OctoML’s compute service provides an optimized structure on the cloud for enterprises to efficiently run their AI models. is more attractive for running production versions of machine learning models.”

Thurai said one of the biggest advantages is that some of OctoML’s optimized models can run almost as efficiently on Nvidia Corp.’s old A10G graphics processing unit as they do on the new A100 GPU. I was. He said this should work in the company’s favor as there is currently a shortage of A100 GPUs available due to extremely high demand.

“With the surplus availability of A10G, enterprises can use A10G to run AI applications with performance similar to what A100 GPUs offer, rather than waiting for access to those resources,” said Thurai. continued. “Customers also have the option of fine-tuning publicly available models with their own datasets. His main OctoML competitor here is his Hugging Face, which is 5x cheaper and 33% faster. The company’s argument that there is is very convincing.”

According to OctoML, early adopters of OctoAI are already building a wide variety of applications using generative AI models such as Stable Diffusion and FlanUL.

“They have two things in common,” says Cezet. “First, model customization is the cornerstone of giving customers a unique experience, which is what differentiates them. We need the ability to rapidly scale our services with flexible hardware options.”

OctoML empowers developers with a “cocktail” of AI models, including open-source AI models, and makes them easier to manage. “It’s been really hard to get started, manage it and run it all,” Jon Turow, partner at Madrona Venture Group, an investor in OctoML, told SiliconANGLE. “What’s interesting about what Octo, Lewis, and the team are working on is that for the first time, open-source AI can offer usability that you can get with a closed model.”

Ceze and Turow recently spoke with John Furrier, host of SiliconANGLE Media video studio the CUBE. Here is the full interview:

Image: OctoML

Your upvotes are important to us and help us keep our content free.

One click below supports our mission to provide free, deep and relevant content.

Join our community on YouTube

Join a community of over 15,000 #CubeAlumni professionals including Amazon.com CEO Andy Jassy, ​​Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many other celebrities and experts. please.

“TheCUBE is an important partner for the industry. I know.” – Andy Jassy

thank you



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *