How does the Convolutional Neural Network (CNN)work?

6 min readApr 5, 2021

I love to work with Natural Language Processing (NLP); unfortunately, I had to introduce the Convolutional Neural Network (CNN) while writing my research paper on Bangla Fake news detection. I then saw some research papers that got a lot of good accuracies using the CNN by surpassing the Long short-term memory (LSTM). Anyone working with LSTM knows how beautiful LSTM is when it comes to textual data. I tried to understand CNN’s theory instead of implementing it. It was a wise decision for me that I am feeling now because without knowing the details about something is a little difficult because then you have to do everything without knowing behind the scene. However, I am particularly fond of CNN because I am currently researching biomedical imaging, and it has shown promising results in medical image data. I think you will fall in love with CNN after reading this article just the way I drowned.

What will you learn?

Human brain’s neuron
The relationship of neural networks with neurons in the human brain
How does CNN extract features from an image?
CNN’s application in real life.

Let’s get started…………………………….

Human brain’s neuron

Both neurons in the human brain are electrically excitable. The electrical impulse is primarily received on the dendrites, processed in the cell body, and then transmitted along the axon. The neuron releases a special molecule called a neurotransmitter until the electrical signal hits the synapse [1].

Three interconnected modules can be found in a brain neuron: Dendrites, Soma or Cell body and Axon. The descriptions of the components are given below:

The Dendrites: Synaptic connections are used to receive information. Sensory data from sensory nerve calls or“computational” input from other neural cells could be used as many as 100K inputs can be found in a single cell (each from a different cell).

The Soma: This is the cell body, where inputs from all dendrites are combined, and a judgment is made whether to fire an output based on all of these signals. This is a bit of an oversimplification since some of the measurement occurs before the Soma and is encoded in the cell’s dendritic structure [2].

The Axon: Once it was decided to ignite an output signal (thus activating the cell), the axon is the mechanism that carries the signal, and through the tree as its terminal structure, it transmits the signal to the next level of dendrites through a synaptic connection to the neuron.

The relationship of neural networks with neurons in the human brain

I guess you already clear on how does neuron of the human brain works? Isn’t it? Let me explain to you another concept that has been created following the human’s brain working structure. Neural networks are designed to work in the same way as the human brain. The brain easily makes choices when reading handwriting or facial recognition. In the case of facial recognition, the brain may begin to ask, “Is it female or male?” Is this a black or white picture? Is it young or old? “Do you have scars?” Etc.

How does CNN extract features from an image?

Convolutional neural networks (CNN) are one of the principal components of neural networks. It consists of neurons with learning weights and prejudices. Each neuron receives multiple inputs and takes a weighted sum over them where it transmits an activation function and responds with an output again. The CNN usually depends on three layers, for instance, Convolutional Layer, Pooling Layer and Dense Layer (Fully Connected Neural Network). As you know already about the neuron of the human brain and each neuron does some work; moreover, in the same way, the CNN layer carried out its contribution when it comes to classifying an image.

The following advantages can be found for classifying the images:

It is possible to feed the image directly in the CNN instead of converting it into the 2D array or any standard dimension.
The data we use for training, such as if we utilize some pictures of cats as training data, then CNN creates a filter based on the features (e.g. eyes, nose, ear, and so on) of each image that helps to detect an image.
It is fast and simple to understand. It is the most accurate of all image predicting algorithms.

Since CNN has three layers (Convolutional Layer, Pooling Layer and Dense Layer)so let’s see how the Convolutional Layer works.

Convolutional Layer: In convolutional neural networks, the main building blocks are convolutional layers. Convolutional Layer is the basic process of applying a filter to an input to produce an activation. As a consequence, extremely unique features appear on input images that can be identified anywhere.

By looking at Figure 3, it can be observed that the size of the images is 5x5 and has some features marked in yellow. The CNN first create a filter (kernel)based on all the features in the input images that allows for image detection. For example, if you imagine a person in the place of an image, his features like eyes, nose, ears, mouth, lips etc. In real life, when we actually see someone we first try to detect them by matching some of their characteristics and this is how our human brain works.

The filter size shown in the figure above is 3x3, basically, it is very difficult to determine the size of the filter because the model automatically create it, but we can define it manually if we want. The size of the filter can be defined manually (3x3, 4x4 or 5x5), nevertheless, 3x3 is the most common approach that everyone used to apply.

The filters are multiplied with the features of each image and the available values are stored in a feature map. The matrix size of the feature map depends on the following equation:

Map size= N-F+1
Here, N= size of the images, F= size of the filter, and 1= bias

If there is an RGB image then this process will continue separately.

2. Pooling Layer: Another component of a CNN is the pooling layer. Its goal is to gradually shrink the spatial size of the representation in order to reduce the number of parameters and computation in the network. Each function map is treated individually by the pooling layer. There are two forms of pooling levels: average maximum pooling and maximum pooling, but maximum pooling is the most common. The pooling layer diminishes the parameters for which the overfitting of the model is decreased.

Figure 4 shows that the pooling layer has reduced the parameters of the features map through two types of procedures (max pooling and average pooling). This is how the pooling operation is done.

3. Dense Layer (Fully Connected Neural Network): Each neuron in a layer receives information from all the neurons in the previous layer, making them densely connected. In other words, the dense layer is a completely connected layer, which means that all the neurons in a layer are connected to the next parts and the input image is classified through this layer. Next, we have to define activation functions like Relu, Softmax, Sigmoid etc. The Activation function depends on data such as Softmax can be used for the case of multiclass classification and Sigmoid for binary classification.

CNN’s application in real life

Image recognition and OCR
Object detection for self-driving cars
Face recognition on social media
Image analysis in healthcare
Medical Image Computing

Conclusion

To conclude, this article discusses CNN’s fundamental and specifically shows the connection of deep learning with the neurons of the human brain. I hope this article will benefit everyone.

References

How does a neuron work?. (2021). Retrieved 5 April 2021, from https://www.wingsforlife.com/en/latest/how-does-a-neuron-work-562/
Do neural networks really work like neurons?. (2021). Retrieved 5 April 2021, from https://medium.com/swlh/do-neural-networks-really-work-like-neurons-667859dbfb4f
Blog. (2021). Retrieved 5 April 2021, from https://glastonburyc.github.io/neuralprimer.html