Japanese researchers release “Fugaku-LLM”

Machine Learning


A team of Japanese researchers used RIKEN's supercomputer “Fugaku” to release a large-scale language model “Fugaku-LLM” (1) with enhanced Japanese language ability. The team consists of Professor Rio Yokota of Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, and Cyber ​​Co., Ltd. Agent Shota Sasaki leads the team.and Noriyuki Kojima of Kotoba Technologies Inc. This research was supported by the Fugaku policy support proposal “Development of distributed parallel learning for large-scale language models using Fugaku” (proposal number: hp230254).

To train large-scale language models on Fugaku, researchers developed distributed training methods, including porting the deep learning framework Megatron-DeepSpeed ​​to Fugaku to optimize the performance of transformers on Fugaku. did. We achieved speed-up of the transformer's dense matrix multiplication library, optimization of Fugaku's communication performance by combining three types of parallelization techniques, and speed-up of Tofu Interconnect D's collective communication library.

  • Furaku-LLM has 13 billion parameters (2), which is larger than the 7 billion parameter model widely developed in Japan.
  • Furaku-LLM has enhanced Japanese capabilities, with an average score of 5.5 on Japanese MT-Bench (3), the best performing open model trained using original data created in Japan. is. In particular, the benchmark score for subjects in the humanities and social sciences reached an astonishingly high score of 9.18.
  • Fugaku-LLM was trained using proprietary Japanese and English data, as well as other data collected by CyberAgent. The Furaku-LLM source code is available on GitHub (4) and the model is available on Hugging Face (5).
  • Furaku-LLM can be used for research and commercial purposes as long as the license is followed.

In the future, training efficiency will increase as more researchers and engineers participate in improving models and their applications, leading to next-generation innovations such as collaboration between scientific simulation and generative AI, and collaboration between social applications. This will lead to innovative research and business applications. Simulating a virtual community with thousands of AIs.

background

In recent years, the development of large-scale language models (LLM) has become active mainly in the United States. In particular, the rapid adoption of ChatGPT (6) developed by OpenAI is having a major impact on research and development, economic systems, and national security. Countries other than the United States are also investing significant human and computational resources to develop LLMs in their countries. In order to keep up with this global competition, Japan also needs to secure computational resources for AI research. Expectations are high for Fugaku, Japan's main supercomputer system, and in order to meet those expectations, it is necessary to develop a computational environment for large-scale distributed learning at Fugaku.

Therefore, Tokyo Institute of Technology, Tohoku University, Fujitsu, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies have started a joint research project to develop large-scale language models.

Role of each institution/company in Fugaku-LLM development

  • Tokyo Institute of Technology: Overall monitoring, parallelization, and communication acceleration of large-scale language models (optimization of communication performance by combining three types of parallelization, acceleration of collective communication in Tofu interconnect D)
  • Tohoku University: Collection of training data and model selection
  • Fujitsu: Acceleration of calculation and communication (acceleration of collective communication of Tofu interconnect D, performance optimization) Pipeline parallelization) and performing pre-training and post-training fine-tuning.
  • RIKEN: Distributed parallelization and communication speedup of large-scale language models (speedup of collective communication on Tofu interconnect D)
  • Nagoya University: Consideration of application method of Furaku-LLM to 3D generation AI
  • cyber-agent: Providing learning data
  • Kotoba Technologies: Porting deep learning framework to Fugaku
RIKEN's supercomputer
RIKEN's supercomputer “Fugaku” ©RIKEN Research Results 1. Significantly improved learning performance of large-scale language models on the supercomputer “Fugaku”

GPUs (7) are a popular choice of hardware for training large language models. However, there is a global shortage of GPUs due to large investments from many countries to train LLMs. In this situation, it is important to show that large-scale language models can be trained using Fugaku, which uses CPUs instead of GPUs. The CPU used in Fugaku is a Japanese-made CPU manufactured by Fujitsu, and it plays an important role in revitalizing Japan's semiconductor technology.

In this research, by maximizing the potential of Fugaku, we succeeded in increasing the calculation speed of matrix multiplication by 6 times and the communication speed by 3 times. In order to maximize the distributed learning performance of the deep learning framework “Fugaku”, Megatron – Deep Speed ported to Fugaku to speed up the dense matrix multiplication library for Transformer.

To increase communication speed, we combined three types of parallelization technology to optimize Fugaku's communication performance and speed up collective communication on Tofu Interconnect D. The knowledge gained through these efforts can be utilized in the design of next-generation computing infrastructure after Fugaku. This will greatly enhance Japan's future advantage in the AI ​​field.

2. Easy-to-use, open, and secure large language model with 13 billion parameters

In 2023, many large-scale language models have been developed by Japanese companies, most of which have less than 7 billion parameters.

Large-scale language models generally perform better as the number of parameters increases, so the 13 billion-parameter model developed by the research team is likely to be more powerful than other Japanese models. Larger models are also being developed overseas, large language model It also requires large amounts of computational resources, making it difficult to use models with too many parameters. Furaku-LLM is a high-performance and well-balanced product.

Furthermore, most models developed by Japanese companies employ continuous learning (8), where open models developed outside Japan are continuously trained on Japanese data. On the other hand, Fugaku-LLM uses team-specific data to learn from scratch, so the entire learning process can be understood, making it superior in terms of transparency and security.

Furaku-LLM was trained with 380 billion tokens using 13,824 nodes in Fugaku, and approximately 60% of the training data is in Japanese, with a mix of English, math, and code. Compared to models that train continuously in Japanese, Fugaku LLM learned more information in Japanese. Furaku-LLM is the best domestically produced open model trained on proprietary data. In particular, it was confirmed that subjects in the humanities and social sciences showed a high benchmark score of 9.18. It is expected that you will be able to engage in natural dialogue based on Japanese characteristics such as honorific language.

future development

The results of this research are publicly available through GitHub and Hugging Face, allowing other researchers and engineers to use them to further develop large-scale language models.

Furaku-LLM can be used for research and commercial purposes as long as the license is followed. Furaku-LLM will also be available to users via the Fujitsu Research Portal starting May 10, 2024.

In the future, training efficiency will increase as more researchers and engineers participate in improving models and their applications, leading to next-generation innovations such as collaboration between scientific simulation and generative AI, and collaboration between social applications. This will lead to innovative research and business applications. Simulating a virtual community with thousands of AIs.

understand

This research was supported by the Fugaku policy support proposal “Development of distributed parallel learning for large-scale language models using Fugaku” (proposal number: hp230254).

[1] Large language model: You can model the probability of text occurrence and predict the text (response) that follows a given context (query).

[2] Parameters: A measure of the size of neural networks. The more parameters, the better the model performs, but the more data is required for training.

[3] Japanese MT bench: Benchmark tests provided by Stability AI[4] GitHub: Platforms used to publish open source software

[5] Faces hugging each other:Platforms used to publish AI datasets

[6] Chat GPT: A large-scale language model developed by OpenAI that has brought about major social changes, with the number of users exceeding 100 million in about two months after its release.

[7 ] GPU: Originally created as an accelerator for graphics, it has recently been used to accelerate deep learning[8] Continuous learning: How to perform additional training on large language models that are already trained. Used to train language models in different languages ​​or domains.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *