Bloomberg Uses 1.3 Million GPU Hours for Large Language Models Developed in-House

Applications of AI


Financial firm Bloomberg is trying to prove there are smarter ways to fine-tune artificial intelligence applications without the ethical or security concerns that plague the likes of ChatGPT.

Bloomberg recently released BloombergGPT. This is his homegrown big language model with 50 billion parameters targeted for financial applications. This model builds on the knowledge base Bloomberg has collected over the last 15 years and is available to customers as part of the Terminal product.

This model does not have the scope of ChatGPT based on the 175 billion parameter GPT-3. But for domain-specific applications like finance, smaller models are the way to go, researchers at Bloomberg and Johns Hopkins argue in an academic paper.

Bloomberg GPT also borrows its chatbot functionality from ChatGPT, which the researchers say is more accurate than comparable models with more parameters.

“Generic models cover many domains, can perform at a high level across different tasks, and remove the need for specialization when training. It shows that the model cannot replace them,” the researchers wrote.

Other IT executives also advocate for small models with billions of parameters, especially for scientific applications. Smaller models yield more accurate results and can be trained much faster than universal models like GPT-3. Smaller models require fewer computing resources.

Bloomberg has allocated nearly 1.3 million hours of training time to BloombergGPT on Nvidia’s A100 GPUs in Amazon’s AWS cloud. Training was done on a 64 GPU cluster with 8 of his Nvidia A100 GPUs (40GB variant) each.

The GPU cluster was linked using Nvidia’s proprietary NVSwitch interconnect with transfer speeds of 600GBps. Nvidia’s GPUDirect connected compute nodes using an AWS Elastic Fabric Adapter with transfer speeds of 400Gbps.

Bloomberg used Amazon’s Luster file system, widely used in high-performance computing, for fast access to files. The file system supported read and write throughput up to 1000MBps.

BloombergGPT is an example of a company using Amazon’s cloud services to train multilingual models. ChatGPT runs on his Nvidia GPU in Microsoft’s Azure service. Google published a paper this week about large-scale language models running on a supercomputer with his 4,096 TPUs (Tensor Processing Units).

The overall memory footprint of distributed GPUs was not enough, so Bloomberg made optimizations to train the model. One of the optimizations was to split the training across 128 GPUs with 4 copies allocated to deal with swapping or glitches. Another optimization involved switching to BF16 vector processing, which reduces training memory requirements while preserving parameters in FP32.

“After trying different techniques, we achieved 102 TFLOPs on average, with each training step taking 32.5 seconds,” the researchers wrote.

Bloomberg scraped 54% of the data set, or 363 billion internal documents dating back to 2007, from Bloomberg’s internal database. For training, we removed the data formatting and templates and entered it into the training system. The remaining 345 billion documents were obtained from public press releases, Bloomberg news articles, public documents and even Wikipedia. Each document was called a “token”.

Researchers wanted training sequences to be 2,048 tokens long to sustain the highest levels of GPU utilization.

“Since we have limited data, we choose the largest possible model, but while still guaranteeing that we can train on all tokens, we do the total calculation as a buffer for unexpected failures, retries, and restarts. Leave about 30% of the budget,” the researchers wrote.

Bloomberg has open-sourced GPT-3 but said it would not release BloombergGPT models for evaluation, following in the footsteps of OpenAI, which charges for access to closed-source GPT-4 announced last month. . Bloomberg’s business model revolves around the proprietary algorithms it uses to provide intelligence to traders and analysts, and with the Bloomberg GPT open, it’s the primary source of documentation used to train its models. Core assets such as the FINPILE database may be exposed.

The researchers also pointed to uncertainties about the toxicity and ethical use of the multilingual model, which have been pointed out as more users try ChatGPT.The company has locked down BloombergGPT for security reasons. I’m here.

“Each decision reflects a combination of factors, including model use, potential harm, and business decisions,” the researchers said.

As the company feeds more data into the system and classifies the problem, it builds the model.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *