Apple releases OpenELM, a slightly more accurate LLM • The Register

Machine Learning


Apple, not really known for its openness, has released a generative AI model called OpenELM. This clearly outperforms a set of other language models trained on public datasets.

This is no big deal. Compared to his OLMo, which debuted in February, OpenELM is 2.36% more accurate and uses half the amount of pre-training tokens. But it may be enough to remind people that Apple is no longer content to be an AI industry wallflower.

Apple's claim to openness comes from its decision to release not just its models, but its training and evaluation framework.

“Unlike the traditional practice of simply providing model weights and inference code and pre-training on private datasets, our release includes training logs, multiple checkpoints, and pre-training on publicly available Contains a complete framework for training and evaluating language models on datasets with a “Training Configuration'' described by 11 Apple researchers in a related technical document.

Also, in a departure from academic practice, the authors' email addresses are not listed. This is due to Apple's interpretation of openness, and is somewhat similar to his less open OpenAI.

Accompanying software releases are not eligible for open source licenses. This is not an unreasonable restriction, but it does make clear that Apple reserves the right to pursue patent claims if derivative works based on OpenELM are deemed to infringe Apple's rights.

OpenELM utilizes a technique called layerwise scaling to more efficiently allocate parameters in transformer models. Therefore, instead of each layer having the same set of parameters, OpenELM's transformer layers have different configurations and parameters. The result is improved accuracy, as shown by the percentage of correct predictions from the model in benchmark tests.

I've heard that OpenELM was pre-trained using GitHub's RedPajama dataset, tons of books, Wikipedia, StackExchange posts, ArXiv papers, etc., and Dolma sets from Reddit, Wikibooks, Project Gutenberg, etc. You can use this model as you imagine. When you give the model a prompt, it will try to respond or autocomplete.

One of the highlights of this release is that it comes with “code to convert models to MLX libraries for inference and fine-tuning on Apple devices.”

MLX is a framework released last year for running machine learning on Apple silicon. The ability to work locally on Apple devices rather than over a network should make his OpenELM even more interesting for developers.

“Apple's release of OpenELM is a major advancement for the AI ​​community, delivering efficient on-device AI processing that is ideal for mobile apps and IoT devices with limited computing power,” said AI services company Aquant. said Shahar Chen, CEO and co-founder. register. “This will enable rapid local decision-making, which is essential for everything from smartphones to smart home devices, expanding the potential of AI in everyday technology.”

Apple has been keen to demonstrate the benefits of its chip architecture, especially for hardware-supported machine learning, ever since Cupertino introduced Neural Engine in 2017. Nevertheless, although OpenELM may score higher on accuracy benchmarks, it falls short in terms of performance.

“We observe that OpenELM is slower than OLMo despite its higher accuracy for a similar number of parameters,” the paper explains, using Nvidia's CUDA on Linux and the MLX version of OpenELM on Apple Silicon. I am quoting the tests performed.

Apple officials said the less-than-winning performance was due to the company's “simplistic implementation of RMSNorm,” a technology that uses machine learning to normalize data. In the future, we plan to consider further optimizations.

OpenELM is available in pre-trained and instruction-tuned models with 270 million, 450 million, 1.1 billion, and 3 billion parameters. Anyone using it is cautioned to do their due diligence before trying the model for anything meaningful.

“The release of the OpenELM model aims to empower and enrich the open research community by providing access to state-of-the-art language models,” the paper states. “These models are trained on publicly available datasets and are available without any security guarantees.” ®



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *