An Ultimate User’s Directory: ALI5 Package for Convolutional Neural Networks

Within the last decade, we have observed the rapid growth of machine intelligence development, and its capabilities are gradually approaching human ones. This research area is multifaceted, and one of the critical aspects is computer vision.

Its essence is that machines can see and perceive the world around them on a par with people. In addition, this ability can be used to solve such problems as working with images, their classification and processing, and the formation of recommendations based on them. The Convolutional Neural Network (CNN) algorithm contributes to the progress and improvement of this field.


A fully connected neural network perceives an image as the summary of digits, representing the codes for three different layers (RGB) matrices. Red, green, and blue components are defined accordingly; one matrix is for red channel, and the second and third successively for green and blue. It is challenging to process large images with lots of pixels, and we need memory resources and computation power to process them.

We can describe a convolutional neural network as a learning algorithm that takes an incoming image and can classify the images, grouping similar ones.

A great benefit of this neural network architecture is that it can quickly process visual data and help in practical tasks, like recognizing criminals in crowded streets.

ConvNets are Feed-Forward Neural Nets

Convolutional NNs are representatives of feed-forward nets. This means that the information within the layers are coming only in one direction (from the input to the output). So, ConvNet works perfectly well for image detection but can not be used to predict stock prices or answer the question if the door on the picture was opened or closed. In other words, the network can learn to understand complex images but can not recognize the previous/following statements for the objects there.

In-coming image

Imagine some 4×4×3 image that is separated by colored planes. If you increase it, then the calculations will be much more intensive. Here, ConvNet lends a hand out as it helps to convert the image into a form that is easier to process.

Convolution process

Let’s see how a 5x5x1 image with a 3x3x1 kernel is convolved. The major element that takes part in the process is the kernel. It glows and multiplies the parts of the image below it.

The offset has a certain step length and analyzes the entire image width, first on the right, then on the left. If the image has multiple channels, then the depth of the input image is considered. Then all the results are concluded.

Let us describe the convolution operation with a step length equal to 2. The goal is to extract high-level functions. The peculiarity of ConvNets is that it does not have to be limited to one layer during the folding process. When adding layers, the network adapts and eventually understands the image as a person does.

The result of the operation can have two options. In the first one, the attribute decreases in comparison with the input data, in the second one, it remains the same or increases. For this, Valid Padding and the Same Padding are used, respectively.

A Layer for Combining

The internal pooling layer is utilized to decrease the spatial size of a collapsed object. In addition, due to it, the learning process of the model is supported.

There are two types: maximum and average union. The first is responsible for returning the maximum value, and the second, by analogy, is for returning the average value. In addition, the Max Pooling function is noise reduction. Therefore, we can conclude that it works better than Average Pooling.

Together, the above two layers form the i-th layer of a convolutional neural network. There may be more such layers if the image is complex.

Having gone through the entire process, the model has understood all the functions, and now you can start refining the result.

Final Layer

As a rule, a fully connected layer is used to study non-linear combinations of high-level functions.

At this stage, the image is smoothed, resulting from the transformation. Such an image enters the neural network and is distributed to each training cycle step. More detailed explanation of different approaches to pattern recognition is described on

Softmax classification method can segregate low-level features in images.

CNN architectures such as LeNet, AlexNet, VGGNet, GoogLeNet, ResNet, and ZFNet play a crucial role in building algorithms and make a priceless contributions to the development of AI.