Editor's Note: Nvidia has free online courses “AI for Everything: From Basics to Gen AI Practices”. team RCR Wireless News Once you complete the unit, you are posting session articles with a little extra context from the ongoing coverage of your AI infrastructure. Think of this as we are trying to make my work better, and maybe along the way, help with your own professional development – that's at least hope.
Generation AI (GEN AI) is about creating or generating new data for various modalities, such as text, natural language, images, and videos, using existing data from various modalities. This is done by synthesizing patterns and building a basic model that can be trained from the dataset.
Basic models, typically large-scale language models (LLMS), act as the basis for the GEN AI system by providing a framework for understanding complex linguistic structures, semantics, and contextual nuances. These neural networks are trained on large, unlabeled datasets using unsupervised learning. The idea is to avoid the costly and time-consuming process of labeling the data before training the model.
Trans Architecture
LLMS uses a trans-architecture, a specialized neural network architecture, to understand the patterns and relationships of text data. The development of the trans-architecture was a breakthrough in the arc of Gen AI. It was explained in a 2017 paper, “Attention is You Need,” written by a team of Google AI researchers.
Prior to transformers, the model relied on recurrent neural networks (RNNS) or convolutional neural networks (CNNs), but the trans-architecture was based on attention mechanisms, particularly attention mechanisms that allow us to weigh the importance of different words in sentences, regardless of where the model falls in order. Essentially, models can capture long-range dependencies and relationships within the data, allowing for more efficient parallelism during training and inference. In short, trans-architecture has pushed AI from sequential processing to parallel processing.
Parallel processing of tokens and GPU acceleration
In the context of LLM, a token is the smallest unit of meaning in a language, such as a word, letter, subword, or other symbol that represents a linguistic element. The input prompt is broken down into tokens fed to the model. The model uses input tokens to predict the next word in the sequence until the user stops providing input tokens or the model arrives at a given stopping point.
To do all that, compared to CPUs, it requires computationally intensive, complex mathematics that makes GPU uniquely suited, taking into account the chip's capabilities for large scale parallelism, matrix operations, memory processing, and memory bandwidth. For example, early cuts of ChatGpt were trained at 10,000 Nvidia GPUs for several weeks.
Gen AI vs. Traditional AI for businesses, and how to get started
Traditional AI is about understanding historical data and making accurate predictions using classification, pattern recognition, text to speech and optical character recognition. Gen AI is excellent at creating new data based on patterns and trends learned from training data. Marked by teacherless learning, chatbots, abstract tools, and copilots. A overlap between traditional AI and Gen AI, and if you consider it a Venn diagram, there is sentiment analysis and linguistic translation.
That's not to say that the two are mutually exclusive. In fact, they are complementary, and traditional AI is suitable for a variety of tasks, providing desirable results in cost, complexity, computational overhead and cost. For businesses looking to leverage Gen AI, considerations are concerning adapting common LLMs to business-specific use cases and applications. Its adaptation process starts with medium customization by fine-tuning pre-trained models or building custom models using their own data. In either case, customizing LLM comes with additional costs associated with both computational resources and the required labor expertise.
As far as you start Gen AI, here is a relatively generalized process.
- Identify business opportunities. This is a target use case that has meaningful business impact and can be customized with unique data.
- Building AI Resources – Identify the required partners while identifying existing computing resources and internal HR.
- Analyze training/customization data. Access, refine and protect your data to build a data-intensive basic model or customize existing models.
- Invest in accelerated infrastructure (access infrastructure, architecture, operating model) and wager costs, including energy costs.
- Responsible AI – Create a plan for leverage tools and best practices so that responsible AI principles are adopted internally.
Here is the workflow to reach a production-enabled application:
- Data Collection – Collect and prepare data to train or fine-tune LLMs to ensure that the data is diverse and representative of the target domain.
- Data Curation – Clean, filter and organize your captured data.
- Pre-Training – Expand the model into a vast corpus of textual data to learn language patterns, relationships, and representations.
- Customize – Adapt common models to specific requirements of a task or domain to improve accuracy, efficiency and effectiveness.
- Evaluating Models – Measure how well the models learn from training data and how accurately they can make predictions with new data.
- Inference – Processes input data to generate output data.
- Keep in mind GuardRails. This is important to mitigate risk and ensure the ethical, responsible and safe use of AI models and attendant applications.
For more information about the AI 101 series, see below.
