Convolutional neural network tutorial [Update]

Artificial intelligence has made great strides, seamlessly bridging the gap between human and machine capabilities. And data enthusiasts around the world are working on different aspects of AI to turn vision into reality. One such amazing field is that of computer vision. This field aims to enable and organize machines to perceive the world in the same way as humans, and to use this knowledge for several tasks and processes (e.g. image recognition, image analysis and classification). is. Advances in computer vision through deep learning have also been highly successful, particularly in convolutional neural network algorithms.

Introducing CNN

Yann LeCun, director of the Facebook AI Research Group, is a pioneer in convolutional neural networks. In his 1988 he built the first convolutional neural network called LeNet. LeNet was used for character recognition tasks such as reading postal codes and numbers.

Wondering how facial recognition works in social media, how object detection can help build self-driving cars, and how visual images are used in the medical field to detect diseases. Have you ever wondered? It's all possible thanks to Convolutional Neural Networks (CNN). Below is an example of how convolutional neural networks work.

Suppose you have an image of a bird and you want to identify whether it is really a bird or some other object. The first thing we do is feed the pixels of the image in the form of an array to the input layer of a neural network (a multilayer network used to classify things). Hidden layers perform feature extraction by performing various calculations and operations. There are multiple hidden layers that perform feature extraction from images, such as convolutional layers, ReLU layers, and pooling layers. Finally, we have a fully connected layer that identifies objects in the image.

Convolutional neural network to identify bird images

Figure: Convolutional neural network for identifying bird images

What is a convolutional neural network?

Convolutional neural networks are feedforward neural networks commonly used to analyze visual images by processing data in a grid-like topology. Also called ConvNet. Convolutional neural networks are used to detect and classify objects in images.

Below is a neural network that identifies two types of flowers: orchids and roses.

In CNN, every image is represented in the form of an array of pixel values.

Convolution operations form the basis of convolutional neural networks. Let us understand the convolution operation using two one-dimensional matrices a and b.

a = [5,3,7,5,9,7]

b= [1,2,3]

The convolution operation multiplies an array element by element and sums the products to create a new array representing a*b.

The first three elements of matrix a are multiplied by the elements of matrix b. The products are summed to give the result.

The next three elements of matrix a are multiplied by the elements of matrix b, and the products are summed.

This process continues until the convolution operation is complete.

How does CNN recognize images?

Consider the following image.

A colored box represents a pixel value of 1, and an uncolored box represents a 0.

Press backslash (\) to process the image below.

Press the slash (/) to process the next image.

Here's another example of how CNN recognizes images.

As you can see from the image above, only values with a value of 1 are lit.

Convolutional Neural Network Layers

Convolutional neural networks have multiple hidden layers that help extract information from images. His four key layers for CNN are:

convolutional layer
ReLU layer
pooling layer
fully connected layer

convolutional layer

This is the first step in the process of extracting valuable features from images. The convolution layer has several filters that perform convolution operations. Every image is considered as a matrix of pixel values.

Consider the following 5×5 image with pixel values of 0 or 1. There is also a filter matrix of dimension 3×3. Slide the filter matrix over the image and calculate the dot product to obtain the convolutional feature matrix.

ReLU layer

ReLU stands for rectified linear unit. Once the feature maps are extracted, the next step is to move them to the ReLU layer.

ReLU performs element-wise operations and sets all negative pixels to 0. This introduces nonlinearity into the network and the output produced is a modified feature map. Below is a graph of the ReLU function.

The original image is scanned using multiple convolutions and ReLU layers to identify features.

pooling layer

Pooling is a downsampling operation that reduces the dimensionality of a feature map. The modified feature map is passed through a pooling layer to generate a pooled feature map.

The pooling layer uses different filters to identify different parts of the image, such as edges, horns, bodies, feathers, eyes, and beaks.

The structure of a convolutional neural network so far is as follows.

The next step in the process is called flattening. Flattening is used to convert all two-dimensional arrays resulting from the pooled feature maps into a single long continuous linear vector.

The flattened matrix is fed as input to a fully connected layer to classify the image.

Here's how CNN accurately recognizes birds.

The pixels of the image are fed into a convolution layer that performs a convolution operation.
A convolved map will be generated
The convolutional map is applied to the ReLU function to generate a modified feature map.
Images are processed with multiple convolutions and ReLU layers to identify features.
Different pooling layers with different filters are used to identify specific parts of the image.
The pooled feature map is flattened and fed into a fully connected layer to obtain the final output.

Implementing a use case using CNN

We use the Canadian Institute for Advanced Study's CIFAR-10 dataset to classify images into 10 categories using CNN.

1. Download the dataset.

2. Import the CIFAR data set.

3. Read the label name.

4. Display the image using matplotlib.

5. Use helper functions to process the data.

6. Create the model.

7. Apply helper functions.

8. Create layers for convolution and pooling.

9. Reshape the pooling layer to create a flattened layer.

10. Create fully connected layers.

11. Set the output to the y_pred variable.

12. Apply the loss function.

13. Create an optimizer.

14. Create variables to initialize all global variables.

15. Create a graph session and run the model.

Build deep learning models with TensorFlow and learn the TensorFlow open source framework with the Deep Learning course (using Keras &TensorFlow). Register now!

Learn more about CNN and deep learning

Here's how to build a CNN with multiple hidden layers and use its pixel values to identify birds. We also completed a demonstration of classifying images into 10 categories using the CIFAR dataset.

You can also enroll in artificial intelligence courses in collaboration with Caltech and IBM to specialize in deep learning techniques using TensorFlow, an open source software library designed to conduct research in machine learning and deep neural networks. It can also be transformed into a house. This PG program in AI and Machine Learning covers Python, Machine Learning, Natural Language Processing, Speech Recognition, Advanced Deep Learning, Computer Vision, and Reinforcement Learning. Get ready for the world's most exciting technology frontier.

Source link