Introducing BloombergGPT: Large Language Model with 50 Billion Parameters Trained on Various Financial Data

AI and ML Jobs

The 2020 release of GPT-3 served as a compelling example of the benefits of training very large autoregressive language models. The GPT-3 model has 175 billion parameters (100 times more than the GPT-2 model) and performs exceptionally well on a variety of modern LLM tasks such as reading comprehension, answering open-ended questions, and code development. demonstrated. Many additional models replicate this performance. In addition, data show that giant models exhibit urgent behavior because their size allows them to acquire skills that smaller models do not. A famous example of urgent behavior is the ability to accomplish a task with a small number of prompts, where the model can learn the task from just a few examples. This ability increases beyond random as the number of language models increases.

In general, fewer-shot prompts significantly increase the number of activities that a model can handle, reducing entry-level costs for customers looking to automate new language tasks. Since GPT-3, models with 280 billion, 540 billion and 1 trillion parameters have been created. Several key factors for developing high-performance LLMs have also been studied, such as different training objectives, multilingual models, more effective and compact models, and determining training sizes that are data and parameter efficient. These initiatives are heavily focused on general LLMs trained on datasets covering a wide range of subjects and domains. Although the LLM incorporates specific datasets on specialized topics such as biological publications, the focus is on developing his LLM with comprehensive functionality.

Recently, models trained using only domain-specific data outperformed general-purpose LLMs on tasks within specific disciplines such as science and medicine, despite being significantly smaller. These results will facilitate further creation of domain-specific models. NLP technology is playing an increasingly important role in the vast and expanding field of financial technology. Sentiment analysis, identifying named entities, categorizing news, and answering questions are some of the financial NLP tasks. Even if the range of features is similar to that found in standard his NLP benchmarks, the complexity and language of the economic domain require domain-specific systems. Having his LLM focused on the financial domain is beneficial, for all the reasons generative LLMs are attractive in general few-shot learning, text writing, conversational systems, etc.

🚀 Join the fastest ML Subreddit community

No LLM has been tuned or tested for financial sector tasks. However, there is a masked language model that is tuned for it. Researchers from Bloomberg and John Hopkins University are training BloombergGPT, his 50 billion parameter language model useful for operations in various financial sectors. Rather than creating small or general purpose LLMs based solely on domain-specific data, they take a hybrid approach. Generic models eliminate the requirement for specialization when training, cover many domains, and perform well in a wide range of activities. However, the current domain-specific model results show that the generic model is no substitute for it. Most of their applications at Bloomberg are in the financial sector, best served by specialized models but supporting a very large and diverse collection of jobs that are adequately served by general models. increase.

Therefore, they set out to develop a model that maintains competitive performance on general-purpose LLM benchmarks and offers best-in-class performance on financial metrics. We can do this by building the largest domain-specific dataset ever and leveraging Bloomberg’s current data generation, collection, and curation tools. Bloomberg is primarily a provider of financial data, so the data analyst has been collecting and curating papers on financial terminology for over 40 years. We closely track data sources and usage rights and maintain extensive archives of financial data across issues.

They combine this data with open datasets to build a massive training corpus containing over 700 billion tokens. They use some of this training data to train a BLOOM-style model with 50 billion parameters. Evaluate your model using standard LLM criteria, open financial benchmarks, and Bloomberg proprietary benchmarks to ensure it performs as expected. Their findings show that their combined training method produced a model that matched or exceeded common NLP benchmarks while performing significantly better than current models on financial tasks within the domain. indicates that

check out paperAll credit for this research goes to the researchers of this project.Also, don’t forget to participate Our 17k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing a Bachelor’s Degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time on projects aimed at harnessing the power of machine learning. His research interest is image processing and his passion is building solutions around it. He loves connecting with people and collaborating on interesting projects.

🔥 Must read – What is AI hallucinations? The problem with AI chatbots How to find hallucinatory artificial intelligence?

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *