New AI Research Proposes Pythia: A Suite of Decoder-Only Autoregressive Language Models Ranging from 70M to 12B Parameters

AI News


Source: https://arxiv.org/pdf/2304.01373.pdf

Transformer-based models are one of the most advanced and sophisticated classes of models in existence today. These models can bring about a paradigm shift in the rapidly developing field of AI given the vast number of use cases such as natural language processing (NLP) generative tasks, text-to-image tasks, etc. I can guess. , his 3D structure prediction of proteins, etc. Their use has also increased exponentially over the past few years as researchers continue to delve deeper into larger and more sophisticated architectures. Little is known about how and why the model works so well. Here it becomes important for him to understand how the LLM evolves during the course of training. Moreover, although previous studies have shown that certain approximate regular patterns appear when language models are scaled, it is important to link these patterns considering how the trained model scales. is still uncharted territory. One of the main reasons behind this is the lack of access to public LLMs that meet all the requirements of researchers.

To propose a solution to this problem, the non-profit AI research group Eleuther AI recently developed a collection of 16 LLMs trained on public data in the same order specifically designed to facilitate scientific research. announced Pythia. Currently, Pythia is the only publicly available model suite containing models trained on the same data and in the same order, and these models are orders of magnitude larger. The team has released 154 checkpoints for each of 16 models, with LLM sizes ranging from 70M to 12B parameters. Additionally, all corresponding data and tools to download and replicate the exact training process are publicly available to facilitate further research. These important properties have helped the researchers behind Pythia conduct various experiments to understand how gender bias, memorization, and few-shot learning are affected by training data and model scale. It was helpful.

There is currently no collection of models that is accessible to the general public, follows a well-established training process, and maintains uniformity across scales. This is where Pythia researchers did their breakthrough research. As shown before, all models are publicly available and use his Pile dataset, a collection of English data widely used for the development of LLMs (especially large-scale autoregressive transformers). are trained by He designed Pythia so that researchers could analyze all intermediate checkpoints. This allows researchers to associate data-driven progress with specific checkpoints. Additionally, the training process and hyperparameters are fully documented to support future research.

🚀 Join the fastest ML Subreddit community

Eleuther AI’s main goal behind the development of Pythia is to understand the power of large-scale language models and enhance future scientific research to overcome their limitations. To this end, the researcher focused primarily on his three case studies, gender bias mitigation, memory in large-scale language models, and the effect of term frequency on small-shot performance, and the experimental methodology of Pythia. showed. Through experiments, the researchers concluded that this highly controlled setup could be used to gain new insights into the dynamics of LLM and its training. The researchers went on to say that it would be impossible to run these case studies for language modeling research using existing model suites.

In conclusion, Eleuther AI’s Pythia is a collection of LLMs trained with consistent data ordering and model architecture across multi-order scales. Their research is primarily focused on three case studies that demonstrate how Pythia can be leveraged to enable experimentation at unprecedented levels of detail in public model suites. These case studies focus on gender debiasing, memorization, and term frequency effects. The researchers add to their findings and analysis how language models change during training and how different model sizes relate to different inferred patterns observed during training. We very much hope that it will stimulate the investigation of


check out paper and Gitbu. All credit for this research goes to the researchers of this project.Also, don’t forget to participate Our 18k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Goa. She has her passions in the fields of machine learning, natural language processing, and her web development. She enjoys learning more about the technical field by participating in some challenges.

🔥 Must read – What is AI hallucinations? The problem with AI chatbots How to find hallucinatory artificial intelligence?



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *