Q&A: Can mathematics reveal the depth of deep learning AI?

UNIVERSITY PARK, Pa. — Artificial intelligence (AI) is becoming increasingly pervasive, integrated into phone apps, search engines, social media platforms, and supporting a myriad of research applications. In recent decades, a type of AI machine learning called deep learning, whose structure is inspired by the neural networks in the human brain, has received particular attention. Deep learning is at the core of large-scale language models used in OpenAI’s ChatGPT and Microsoft Copilot, for example. More specialized deep learning models have supported a wide range of scientific research, including the 2024 Nobel Prize-winning study in chemistry predicting complex protein structures.

One of the benefits of deep learning is that it can recognize patterns and features without explicit human programming, but this process can be opaque. This “black box” nature of deep learning raises questions about how exactly the model works, making model validation and optimization difficult.

In the following Q&A, Penn State mathematics professor Leonid Berlyand and graduate student Oleksiy Krubchitskyi discuss how they are applying mathematical principles to uncover the nature of deep learning’s black boxes.

Q: What is deep learning?

Berlyan: Deep learning is a type of machine learning that uses artificial neural networks to learn from data, similar to how humans learn. These networks, also known as ANNs, were originally developed by computer scientists and were inspired by the structure of the human brain. An ANN consists of nodes connected by edges, typically arranged in layers. Broadly speaking, these nodes are “artificial neurons” whose edges mimic the synapses that connect neurons in the brain. Learning occurs during the training process, where data is introduced into the network and the ANN iteratively adjusts the connection weights to reduce prediction errors.

Q: What is deep learning used for?

Berlyan: Deep learning has dramatically changed many areas of science and technology, including speech recognition, computer vision, and natural language processing. Simple examples include classification problems, such as a cell phone determining whether a face is you or classifying images such as the handwritten digits 0 to 9. In the latter, the input is an image whose pixels are converted into a vector whose components are the intensities of each pixel. In the output, the number image is categorized as 0, 1, 2, etc. Recently, ANN-based large-scale language models have become widely popular due to their excellent performance in various applications such as education, medicine, and scientific research. In fact, so far this year, ChatGPT is impacting approximately 700 million users each week.

Krubchitsky: Deep learning networks are particularly good at analyzing large amounts of unstructured data such as images and text. It is widely used in chatbots, image recognition needed for self-driving cars, and recommendation services such as those used by video streaming platforms.

Q: What is “deep”?

Berlyan: Artificial neural networks have many hidden layers between the input and output layers. For example, if you have a model that classifies numbers from 0 to 9, one layer can focus on the edges of the image, another on the darkness of certain pixels, and each layer identifies increasingly complex features. It has been empirically observed that adding more layers increases the accuracy of the ANN and allows it to answer more complex questions. Models with more layers are considered “deeper” and therefore “deep learning”.

Krubchitsky: Deep learning models can have hundreds of such layers and millions or trillions of parameters. In deep learning, humans do not explicitly program all connections between layers. The model itself establishes these features and automatically discovers relevant features. This type of model is often called a “black box” because we don’t know exactly what’s going on. One of our goals is to apply mathematical tools to better understand what these models are actually doing, ensuring robustness and ultimately improving performance.

Q: What do we get when we apply mathematical fundamentals to deep learning?

Berlyan: Deep learning was originated and developed primarily by computer scientists and engineers. My Penn State colleague Pierre-Emmanuel Javin, a distinguished professor of mathematics, and I wanted to provide a rigorous mathematical basis for various performance criteria for ANNs, such as the stability and convergence of training algorithms and when an algorithm is considered “trained.” This motivation led me to write a simple introductory textbook for undergraduate mathematics students in which the definitions and concepts of deep learning are presented in a precise mathematical framework.

What I like to tell my students is that you can be a race car driver and know how to operate a car, but you can’t improve a car or design a new one unless you know what’s inside. Similarly, a mathematical understanding of deep learning can improve prediction accuracy and improve the performance of ANNs.

Krubchitsky: There are so many different use cases for deep learning, but the underlying mathematics is all the same. Understanding the basics of deep learning is critical to creating reliable, interpretable, and robust networks.

Computer scientists and engineers have a number of tools to improve the performance of ANNs, mainly based on empirical observations. We offer a rich set of mathematical theories that have been developed over decades and even centuries and have been applied and improved in a variety of fields, including physics, materials science, and life sciences. Using mathematics in deep learning can help you understand which types of problems are best suited for ANNs, how to best build networks, how long they need to be trained, and generally help improve stability.

Source link