Tokyo, May 22, 2023 — Tokyo Institute of Technology (Tokyo Tech), Tohoku University, Fujitsu Limited, and RIKEN today announced that they will embark on research and development for distributed training of large-scale language models (LLMs) (1). Did. Operation of the supercomputer “Fugaku” will start in May 2023, but within the scope of efforts to use “Fugaku” stipulated by national policy.
LLM is an AI model for deep learning that is at the core of generative AI such as ChatGPT (2). The four organizations will develop an environment for creating LLMs that can be widely used by academia and companies, and by disclosing the results, contribute to improving Japan’s AI research capabilities and increase the utility value of Fugaku in the academic and industrial fields. Aim for This research and development in the future.
BackgroundWhile many expect LLM and generative AI to play a fundamental role in the research and development of technologies for security, the economy, and society at large, progress and refinement of these models will require large-scale You need high-performance computing resources that can process your data efficiently. amount of data.
Tokyo Institute of Technology, Tohoku University, Fujitsu, and RIKEN are working to focus on R&D towards distributed training for LLM.
Implementation periodFrom May 24, 2023 to March 31, 2024 *Effort period to utilize Fugaku in Japanese policy
Role of each organization/company
The technology used in this project will enable efficient training of large-scale language models on the large-scale parallel computing environment of the supercomputer Fugaku. The roles of each organization and company are as follows. Tokyo Institute of Technology: Monitoring of the entire process, parallelization and acceleration of LLM Tohoku University: Collection of learning data, selection of models Fujitsu: Accelerating LLM
RIKEN: Distributed parallelization and communication speedup of LLM, LLM speedup
Future PlansIn the future, in order to support LLM development by Japanese researchers and engineers, the four organizations will publish the research results obtained within the scope of efforts to use Fugaku as stipulated by Japanese policy on GitHub (3) by the end of this fiscal year. and Hugging Face (4). In addition, many researchers and engineers are expected to participate in the improvement of basic models and new applied research in order to create efficient methods that will lead to next-generation innovative research and business results. The four organizations are also collaborating with Nagoya University, which develops data generation and learning methods for multimodal applications in industrial fields such as manufacturing, and CyberAgent, Inc., which provides data and technology for building LLMs. We will consider.
Comment from Toshio Endo, Professor, Global Science Information Center, Tokyo Institute of Technology:“In this collaboration, we will parallelize and speed up a large-scale language model using Tokyo Tech and RIKEN’s supercomputer Fugaku, develop Fujitsu’s high-performance computing platform software for Fugaku, and perform AI model performance tuning. Integrate natural language processing at Tohoku University. technology. In collaboration with Fujitsu, we will also utilize the small-scale research institute established in 202X under the name of “Fujitsu Next-Generation Computing Infrastructure Collaborative Research Center.” We would like to contribute to the improvement of Japan’s AI research capabilities together with everyone by utilizing the large-scale distributed deep learning function provided by “Fugaku”. ”
Comment from Professor Kentaro Inui, Graduate School of Information Science and Technology, Tohoku University:“We aim to build large-scale language models that are open-source, commercially available, and primarily based on Japanese data, with transparency of training data. Traceability of training data. By enabling , we hope to promote research robust enough to scientifically examine issues related to common AI black-box problems, bias, misinformation, and the so-called ‘hallucination’ phenomenon. will be We will build a large-scale model using the knowledge obtained from deep learning of Japanese natural language processing developed at Tohoku University. We look forward to sharing the research results obtained through this effort with researchers and developers and contributing to the improvement of our domestic and international AI research capabilities. ”
Comment from Seiji Okamoto, Executive Officer, General Manager of Fujitsu Research Division, Fujitsu Limited:“We are excited about the opportunity to leverage the powerful parallel computing resources of the supercomputer Fugaku to enhance our research in AI and advance the research and development of LLMS. We aim to provide applications that cause a paradigm shift that contributes to the realization of a sustainable society. ”
Comment from Satoshi Matsuoka, Director of the RIKEN Center for Computational Science:“The A64FX(5) CPU has an AI acceleration function called SVE. However, software development and optimization are essential to maximize its capabilities and exploit them for AI applications. This collaboration brings together his LLM and computer science experts in Japan, including researchers and engineers from RIKEN R-CCS, to develop the technology to build his LLM on the supercomputer Fugaku. I believe that it will play an important role in Together with our collaborators, we will contribute to the realization of Society 5.0. ”
Project nameDistributed Training of Large Language Models in Fugaku (Project number: hp230254)
(1) Large language model:Neural networks with hundreds of millions to billions of parameters pretrained using large amounts of data. In recent years, GPT in language processing and ViT in image processing are known as representative large-scale learning models. (2) ChatGPT: A large-scale language model for natural language processing developed by OpenAI that supports interactive systems and high-precision automatic sentence generation. (3) GitHub: A platform used to publish open source software worldwide. (4) Hugging face: A platform used to publish AI datasets worldwide. (5) A64FX: An ARM-based CPU developed by Fujitsu, which is installed in the supercomputer “Fugaku”.
About Tokyo Institute of Technology
Tokyo Institute of Technology is at the forefront of research and higher education as Japan’s leading science and technology university. Tokyo Tech researchers excel in fields ranging from materials science to biology to computer science to physics. Founded in 1881, Tokyo Tech enrolls more than 10,000 undergraduate and graduate students each year who grow to become scientific leaders and the industry’s most sought-after engineers. Embodying the Japanese philosophy of monozukuri, which means “technical ingenuity and innovation,” the Tokyo Tech community strives to contribute to society through impactful research. www.titech.ac.jp/english/
About Tohoku University
Tohoku University has 10 faculties, 15 graduate schools, 6 research institutes, and 18,000 students. About 10% of our students come from abroad, contributing to one of the most international academic environments in Japan. Tohoku University was granted Designated National University status by the Japanese government in June 2017 for its excellent learning environment, international outlook, and influence on research. It has also held the #1 spot in Times Higher magazine for the past four years. Japanese university rankings published annually by Education. A list focused on institutional resources, educational quality and the overall student experience.
Fujitsu’s purpose is to build trust in society through innovation and make the world more sustainable. As the digital transformation partner of choice for customers in over 100 countries, our 124,000 employees are committed to solving some of the greatest challenges facing humanity. Our wide range of services and solutions leverages his five key technologies: Computing, Networks, AI, Data & Security, and Converging Technologies, which combine to deliver a sustainable transformation. Fujitsu Limited (TSE: 6702) reports consolidated sales of ¥3.7 trillion (US$28 billion) for the fiscal year ending March 31, 2023, still trailing Japan in market share. At the top of the digital services company. For more information, please visit www.Fujitsu.com.
RIKEN Center for Computational Science (R-CCS) is a leadership center for high-performance computing that explores “the science of computing, by computing, for computing’s sake.” As a result of its exploration, technologies such as open source software are its core competencies. R-CCS strives to strengthen its core competencies and spread its technology around the world. Together with Fujitsu, R-CCS has developed the world’s most powerful supercomputer, Fugaku. Fugaku started full-scale operation in March 2021, with an order of magnitude improvement in computing power and synergies with other IT ecosystems such as big data and artificial intelligence. In the past, R-CCS operated the K computer (2012-2019), and brought many world-leading scientific and technological achievements not only to academia but also to industry.
Source: Fujitsu; Tokyo Institute of Technology; RIKEN; Tohoku University