The road to Zettascale and quantum computing is long and winding

In the United States, the first steps toward exascale HPC systems began in 2007 with a series of workshops. It was a decade and a half later that his 1,686 petaflops “frontier” system at Oak Ridge National Laboratory came online. . This year, Argonne National Laboratory will switch on “Aurora,” which will either be his second or his third exascale machine in the United States, depending on when the Lawrence Lab’s “El Capitan” system goes live. I am preparing to put it in. Livermore National Laboratory.

All of these machines have had delays and setbacks along the way to exascale, as well as changing technology, continued competition from China, and other challenges. But according to Rick Stevens, deputy director of the Computing Lab for the Environmental and Life Sciences at Argonne University, he doesn’t expect the next leap to the Zettascale, or even quantum computing, to come any sooner. Both could take another 15-20 years or more.

This is the nature of HPC.

“This is the long game,” Stevens said at a recent webinar on the near and far future of computing in HPC. “If you’re interested in what happens next year, HPC isn’t the game for you. Because we’re on a 1,000 year orbit to reach .These are just the early stages.Yes, Moore’s Law has worked.Humanity isn’t ending tomorrow.There’s still a long way to go, so high performance. We have to think about what computing will mean 10 years from now, what will it mean 20 years from now? will have.”

The central “now” part of Stevens’ talk is AI. His AI-enhanced HPC applications and research areas include AI-managed simulations and surrogates, purpose-built AI accelerators, and the role AI plays in the development of large-scale systems. He noted an explosion of events in the AI space from 2019 to 2022, with much of that period being spent in the COVID-19 pandemic.

With large-scale language models central to tools such as the popular ChatGPT and other generative AI chatbots, and text-to-image deep learning via stable diffusion becoming more prevalent, AI techniques are poised to generate billions of proteins. It was used to improve folding and open mathematics. And there has been massive adoption of AI among HPC developers. AI was used to accelerate HPC applications. In addition, exascale systems began to emerge.

“This explosion continues in that more and more groups are building large-scale models, and most of these models are in the private sector,” Stevens said. “There are only a handful of them being done by nonprofits, many of which are closed source, including GPT-4, which is currently the best. It shows that we’re moving towards a relatively small number of very strong models, rather than one that’s there, and that’s the kind of important meta thing that’s happening right now.”

Simulations, surrogates, emerging AI applications, and AI use cases will all require even more computing power in the years to come. The Argonne Leadership Computing Facility (ALCF) in Illinois is beginning to look into this as it plans for its post-Aurora machine and beyond. Stevens and his colleagues envision a system that is eight times more powerful than his in Aurora, with a request for proposals in the fall of 2024 and implementation by 2028 or, he said, 2029. Low precision arithmetic approaches half a zettaflop. There are going to be He Two or He Three spinoffs from now on,” Stevens said.

One question concerns the accelerators of such systems. Will they be new versions of the general-purpose GPUs in use today, GPUs that have been enhanced with something specialized for AI simulations, or entirely new engines optimized for AI? Is not it?

“That is the fundamental question. We know that simulation will continue to be important and we will need high-performance, high-precision numbers, but the question of what proportion it will be with AI is an open question. is,” he said. “Various centers around the world thinking about the next generation will have to make similar decisions about how much they will lean towards the AI market and AI application base going forward.”

ALCF built an AI testbed using systems from Cerebras Systems, SambaNova Systems, GraphCore, Intel’s Habana Labs division, and Groq. It includes accelerators designed for AI workloads, allowing these technologies to mature quickly enough to lay the foundation for large-scale systems and effectively run HPC machine learning applications. .

“The question is, is a general-purpose GPU fast enough for that scenario and tightly coupled with the CPU enough that it’s still the right solution, or will something else come along in that time frame? That’s it.” He added that the issue of multi-tenancy support will be key. “If you have an engine that uses some subset of your nodes, how can you support some applications within that subset? How can we do this? There are a lot of open questions about how to do this.”

Some of those questions are outlined below.

There is also the question of how to build these new big systems. A new wave of technology, such as a change in cooling or power systems, usually means a major upgrade of the entire infrastructure. Stevens said the idea of a more modular design, where components can be switched but the system itself remains intact, makes more sense. Modules in the system may be larger than the current node, but can be replaced periodically without upgrading the entire infrastructure.

“Would there be a module that would be replaced more frequently, working with a base with power, cooling, and perhaps passive optical infrastructure, and a fab node with a very simple interface?” he said. “There are power connectors, optical connectors, and cooling connectors. How can we make it easier to upgrade in 2-year increments instead of 5-year deadlines?”

ALCF has more than in previous years, given that the Department of Energy’s Office of Science has assets such as exascale computing and data infrastructure, large-scale experimental facilities, and a large code base for scientific simulations. We are actively working on these issues. . There are also many interdisciplinary teams that transcend disciplines and laboratories. According to Stevens, the exascale computing project he had 1,000 people working on together.

Automation is another factor. Argonne and other laboratories have all these large machines and numerous applications, he said. Can we find a way to automate much of the work, such as creating and managing AI surrogates, to make the process faster, easier, and more efficient? That’s another area of ongoing research.

While all of this work is underway, development of the zettascale and quantum systems are moving at their own pace, neither of which Stevens expects to see widespread use for another 15 to 20 years. . By the end of the decade, we will be able to build low-precision, zetta-scale machines, but how useful such systems are. Eventually we’ll be able to build such machines 64-bit, but that’s probably not until 2035, at least for him. (Not his 2027 that Intel was talking about) next platform Around October 2021)

When it comes to Quantum, the associated costs are just as important as the technology. Running his application for two weeks on an exascale machine would take about $7 million in compute time. As you can see below, on a quantum machine scaled up to 10 million qubits (which doesn’t exist yet), running the problem could cost him $5 billion to he $20 billion . The cost would have to drop by an order of magnitude before people would pay to solve a large-scale problem.

“What this tells us is that what we have to do is keep advancing classical computing while quantum evolves, because we have to use classical computing “We know we can solve real problems by doing it,” he said. “That’s actually kind of the argument against it. He thinks it’s going to take 15-20 years for zettascale progress as well, but that’s a period where we’re pretty confident that these machines I know I can actually use the .

All of this goes back to the initial theme that HPC innovation takes a long time. Quantum-classical hybrid systems may eventually be the way to go. The industry may need to switch its computational underpinnings to molecular, optical, or something yet to be invented. Engineers, scientists and others will have to deal extensively.

“AI is currently changing the landscape most rapidly, but we are still just scratching the surface of how to restructure systems to be ideal platforms for performing large-scale AI computations. said Stevens. “This could be very transformative, and if we were having this conversation 10 years from now, maybe something else would have happened. No. I think it’s going to be somewhere in between, it’s going to be a long game and there’s going to be a lot of chaos, and what we’ve got to get used to is finding ways to overcome chaos, not how to fight chaos. are our friends, in fact they give us new capabilities and we should actively seek them out.”

Source link