“The AI iPhone moment has begun,” Nvidia CEO Jensen Huang said during a keynote at the company’s GTC conference this week.
He was referring to OpenAI’s ChatGPT, which has gained millions of users in just six months. The user’s queries are processed by his Nvidia’s GPU and the responses are spewed out to the user.
Similar to Apple, which owns a hardware and software stack, Nvidia uses its own hardware and software tools to lock developers into its own ecosystem. To that effect, Nvidia announced at its GPU Technology conference a new workflow for developing applications like his ChatGPT using large language models.
Indeed, Nvidia, like Apple, has the best hardware and software stacks for artificial intelligence. But once a developer is trapped in his Nvidia’s own ecosystem, it can be expensive and difficult to get out of.
But Nvidia dominates the AI and has an edge by getting coders on its side. Nvidia’s GPUs perform best when coded with the chipmaker’s CUDA parallel programming framework, which compiles code and dispatches work and data to the GPU.
Huang emphasized that the move to generative AI is underway. Tools like ChatGPT can automate coding, but there are new considerations that developers need to take into account, such as tuning code to hardware accelerators to provide the fastest results.
Traditional coding relies on CPU parallelism and hits a performance wall. When using AI, the software has to interact with specialized accelerators such as her GPU. A GPU takes a query, weights various parameters, and spits out the best possible answer.
open source CUDA
Nvidia has open sourced the CUDA library to ease migration of workloads to the programming framework. Developers can modify these libraries into their applications as needed, easing the transition to CUDA and GPU acceleration.
“Accelerated computing is not easy. It requires the invention of the full stack, from chip, system and network acceleration libraries to application refactoring,” Huang said in his keynote.
Users were sometimes unable to use ChatGPT because the server reached peak capacity. With the surge of interest in generative AI, there is a shortage of hardware to run the algorithms. The first dip in AI hardware goes to cloud-native companies like Facebook, Google, and Microsoft, who are designing data centers to handle AI applications.
ISO C++ has no native parallelism, so there is a fundamental coding problem. Programmers can use frameworks such as Nvidia’s CUDA to recompile their code to take advantage of the computing power of GPUs.
CUDA provides libraries and frameworks, including a highly tuned math library, a core library of data structures and algorithms, and a communication library for scaling up your applications. CUDA supports C++, Fortran, Python, and works with software libraries such as TensorFlow and PyTorch.
“It’s not just languages that are supported. [CUDA] Stephen Jones, Principal Software Architect at Nvidia, said: Trade fair.
CUDA can be recompiled from many programming languages, but not from WebAssembly.
“I think I’ve heard about some academic projects that people are looking into. I think this is one of the many directions multi-node systems are going. , the ubiquity of web assembly is very attractive,” said an Nvidia moderator during the CUDA session.
This includes recompiling the code to work on a quantum computer that can be simulated on Nvidia hardware. A programmer can take regular code and recompile it with her CUDA to see how it behaves in a GPU-simulated surrogate quantum computer environment.
But GTC’s focus was squarely on AI and Nvidia’s GPUs. Just four years ago, his in-person GTC conference had 8,000 attendees, and this year about 250,000 developers attended the virtual conference, Huang said.
“Generative AI is a new kind of computer…anyone can tell a computer to solve a problem. It used to be a domain exclusively for computer programmers. said in
This may sound like bad news for coders, but Nvidia executives at the show said accelerated computing is a new era of probabilistic computing where developers can reason and predict outcomes. said to help migrate to
At a press conference, Manuvir Das, vice president of enterprise computing, said application development is shifting to creating AI models.
Nvidia has announced pre-packaged AI models called Foundations. It allows coders and data scientists to develop their own chatbots and image and video generators.
One service, called NeMo, lets you develop ChatGPT-like experiences where AI can generate summaries, find market information, and answer questions. Another module called Picasso is for generating images, videos or 3D models, and BioNeMo for protein structures and other biotechnology applications.
“Customers can bring or start models from NeMo’s pre-trained language models spanning GPT-8, GPT-43, GPT-530 billion parameters throughout the process,” said Das. I’m here.
Each model can be connected to its own dataset and will improve over time as more data is added. There are also guardrails to prevent the AI from getting too emotional or giving unwanted answers, as happened with Microsoft’s Bing AI and Google’s Bard.
Developers can access the model through an API. Each modality includes a tuned inference engine, a framework for data processing, and a vector database.
No pricing has been given for access to the Foundation model, but judging by other Nvidia hardware and software announcements, it could require a significant investment.
DGX Cloud
NeMo and Picasso services will be accessible via Nvidia’s DGX Cloud hardware announced at the show.
Starting at $37,000/month, DGX Cloud provides access to the latest GPU and AI enterprise software toolkits in the cloud. That’s about double the price of his Nvidia GPU instance in Azure, up to about $20,000 per month. DGX Cloud services are offered to customers through public cloud providers such as Azure and Oracle.
Justin Boitano, vice president of EGX computing at NVIDIA, said investing in AI could help reduce costs.
“At the end of the day, getting the business results you need at a lower cost usually frees up investment in new areas,” says Boitano.
Nvidia has also released other tools like CvCUDA, which does video processing for faster multimedia delivery to smartphones and other devices. CvCUDA provides 30 operators with Python and C++ bindings to use accelerated computing for image warping, video editing, and image processing.
“What we wanted to do was look at the bottlenecks next to AI in TensorFlow and Pytorch and make sure that video pre- and post-processing is very efficient,” said Boitano. I’m here.
CvCUDA is available in a GitHub repository with public source code. The company works with developers who want to build on that library.
“That openness basically also allows you to eventually fork if you want to, so you can make your own copies and extensions. And they license it to run in production.” We are getting it,” says Boitano.
Another CUDA tool called CuOpt solves the problem of optimizing the best route, which Boitano called “the traveling salesman.”
Developers can pull data from ESRI’s ArcGIS cloud service and build cost models for routes. Businesses can optimize depots to pick up, number of cars required, store locations, and optimize routes for the shortest time and lowest cost.
“We also have an eye on making APIs available to people who just want to use them as a service and not bring up data center infrastructure,” said Boitano.
Microsoft is also betting the future of software and the cloud on Nvidia hardware. The software giant plans to deploy Nvidia’s OVX-2 servers to embed Metaverse applications into Microsoft Office 365 applications. Microsoft’s Bing with AI runs on his Nvidia’s A100 GPUs and the company is building a supercomputer with its latest H100 GPUs based on an architecture called Hopper.
