MosaicML Releases Open Source 30B Parameter AI Model for Enterprise Applications

Applications of AI


MosaicML Inc., a generative artificial intelligence startup that provides enterprises with the infrastructure to run machine learning services, has launched MPT- Announced 30B open source availability.

The company said the MPT-30B model surpasses the quality of the original GPT-3 released by OpenAI LP in 2020. Also, GPT-3 has 175 billion parameters for him, as this model is built on 30 billion parameters, which is one-sixth of his. You can train faster and deploy to local hardware more easily.

This means that starting today, developers and businesses can fine-tune and deploy their own generative AI models in-house, at GPT-3 grade quality, at compute orders of magnitude lower than the original. Making generative AI applications accessible to more businesses without compromising data privacy or security.

MPT-30B has been trained on longer sequences than GPT-3 (up to 8,000 tokens) and can actually handle much longer data context windows, making it suitable for data-intensive enterprise applications. It also outperforms many of the current models in its weight class on the market, including Meta Platforms Inc.’s popular his LLaMA family and the Technology Innovation Institute’s recent Falcon model trained with 2,000 tokens.

The news follows the early May launch of the MPT-7B foundation model of MosaicML, which includes Base, Instruct, Chat, and StoryWriter. Since then, these models have been downloaded over his two million times.

Jonathan Frankle, principal researcher at MosaicML, told SiliconANGLE that building new models has been a learning experience for the company about scaling AI models. “It’s hard to scale,” says Frankl. “I think it’s an underestimate how hard it is to scale. We’ve seen others in the open source space run into challenges. We have certainly faced challenges.”

As for the model, although it wasn’t designed specifically for coding, he said it’s particularly good at coding. Developers will also find this tool works better as a chatbot or instruction set for reasoning when generating summaries or answering questions.

Why 30 billion parameters? Frankle explained that it’s all about maintaining or surpassing the quality of GPT-3 while being easy to run on local hardware. “So the magic number that represents the quality of GPT-3 he tends to be 30 billion,” he said. “Obviously, GPT-3 has been trained on a larger model, with fewer parameters, but I have learned a lot since then about the right balance. one he fits on the A100.”

In this case, Frankle refers to the Nvidia A100 high-performance graphics processing unit used to perform the computations required to deliver the generative AI workload. If you exceed the 30 billion limit, you’ll need to split the model into parallel segments or other techniques to fit. Others, such as the Falcon 40B model, don’t fit his A100 and find that there is a threshold that requires an expensive multi-GPU setup.

An open source MPT-30B foundation model is now available for developers to download from the HuggingFace Hub. It allows developers to fine-tune their models with their own data on their own hardware, or deploy their models for inference on their own infrastructure.

MosaicML also offers its own AI infrastructure as a service for inference using the MPT-3B-Instruct managed endpoint. This means developers don’t have to worry about their own GPU. The price is $0.002 per 1,000 tokens, which is 10x cheaper than OpenAI’s DaVinci.

Frankle added that the future of MosaicML will continue to release open source models. “At the end of the day, it all boils down to what we do as a company and as researchers. Open source is very important to me, and it’s a matter of perspective,” he said. Told. “In a way, it’s a demo truck. We’re training models at this scale for our customers.”

Frankl said scaling up from 7 billion to 30 billion is just the first step, calling it the “MPT 1 process.” This proves that the research team can build a large model and iterate on it, with even larger, higher quality models coming next.

“The next step is building a larger model and MPT 2 process, which we are working on,” says Frankle. “This will reduce costs, build better models, better optimizations, better architectures from many perspectives, and keep the plan moving forward.”

Image: Pixabay

Your upvotes are important to us and help us keep our content free.

One click below supports our mission to provide free, deep and relevant content.

Join our community on YouTube

Join a community of over 15,000 #CubeAlumni professionals including Amazon.com CEO Andy Jassy, ​​Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many other celebrities and experts. please.

“TheCUBE is an important partner for the industry. You guys really attend our events. – Andy Jassy

thank you



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *