
The release of OpenLLaMA, an open-source recreation of Meta AI’s LLaMA model, marks a new development of large-scale language models. The creators of OpenLLaMA have published their liberally licensed model as a 7B OpenLLaMA model trained on 200 billion tokens. This release includes his PyTorch and Jax weights, evaluation results, and a comparison of his pre-trained OpenLLaMA model with his original LLaMA model. This development has important implications for machine learning, especially for researchers who need large-scale language models but face challenges in accessing their own models.
The creators of OpenLLaMA share details of how they trained the model on the RedPajama dataset, which replicates the LLaMA training dataset containing over 1.2 trillion tokens. We followed the same preprocessing and training hyperparameters as in the original LLaMA paper, including model architecture, context length, training steps, learning rate schedule, and optimizer. The only difference between their approach and the original is the dataset used. OpenLLaMA uses the RedPajama dataset instead of the dataset used in the original LLaMA.
The model was trained on cloud TPU-v4 using EasyLM, a JAX-based training pipeline developed for training and fine-tuning language models. They combined regular data parallelism and fully partitioned data parallelism (also known as ZeRO stage 3) to balance training throughput and memory usage. Overall, their training run achieved a throughput of over 1900 tokens/s/TPU-v4 chip.
The performance of OpenLLaMA was evaluated on several tasks using lm-evaluation-harness. The results were compared with his original LLaMA model and GPT-J (a 6B parameter model trained on the Pile dataset by EleutherAI). The metrics for the original LLaMA model were generated by running on the same task. The results of the LLaMA model differed slightly from those reported in the original LLaMA paper. This may be due to differences in evaluation protocols. However, according to the results presented, OpenLLaMA performed as well or better than the original LLaMA and GPT-J on most tasks. OpenLLaMA was trained with 200 billion tokens instead of the 1 trillion tokens used in the original LLaMA and the 500 billion tokens used in GPT-J. Its performance is expected to improve further.
To encourage feedback and collaboration from the community, the team behind OpenLLaMA has released a weights preview checkpoint. These weights are available in his two formats, the EasyLM format for use with the EasyLM framework, and the PyTorch format for use with the Huggingface transformer library. Unlike the original LLaMA model, OpenLLaMA’s tokenizer and weights are trained completely from scratch, so we no longer need to get the original LLaMA tokenizer and weights. However, it is essential to note that OpenLLaMA uses a BOS (start of sentence) token (id=1) during training. So this token should be prepended for optimal performance during several evaluations. The preview checkpoint weights and EasyLM framework are licensed under the Apache 2.0 license. The team is currently focused on completing the training process on the entire RedPajama dataset to allow comparison between the original LLaMA and OpenLLaMA. Additionally, we are working on training smaller 3B models for low resource use cases. The team plans to release more updates in the near future.
check out GitHub link. don’t forget to join Our 20k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more. If you have any questions about the article above or missed something, feel free to email me. Asif@marktechpost.com
🚀 Check out 100 AI Tools in the AI ​​Tools Club
Niharika is a technical consulting intern at Marktechpost. She is in her third year of undergraduate studies and is currently completing her Bachelor’s degree at the Indian Institute of Technology (IIT), Kharagpur. She is a very passionate person who has a keen interest in machine learning, data her science, AI and avid reader of the latest developments in these fields.
