Neural Networks, the buzzword of the date were in fact invented in the mid-1900s. The algorithm that trains a neural network, called the backpropagation algorithm was formulated by the famous Geoffrey Hinton in the 1960s. So why do you hear about it now? Because we didn’t have the computational power or the huge amounts of data to use this tech worldwide. If you are someone looking for a solid career in AI, you have to know things about Neural Networks.
In this blog, we talk just about them. First, we review Neural Networks briefly. After we grab an intuition, we move on to exploring different types of neural networks and their use cases.
Deep Neural Networks (DNN)
They were inspired by how the brain works. The fundamental unit of a neural network is a neuron. They get multiple inputs which are combined using a linear combination and then this combination is passed through an activation function. This activation function output is the final output of a neuron. Consider the following diagram
Image credit – Cloudinary
Each input (x1, x2, ..) is multiplied with its respective weight (w1, w2, ..) and then these products are added (w1*x1 + w2*x2 + …). This addition is the linear combination mentioned earlier. This quantity is then activated using the activation function f. The output of this function is a single number denoted by y.
The power of a neural network comes from combinations of these neurons. For example
Image Credits: Wikimedia
Each circle is a neuron. There are 3 layers – the input layer, hidden layer, and the output layer. The output of neurons in a layer is passed to the next layer. The number of neurons in the hidden layer can be varied. In fact, we can have more than one hidden layers.
So where do the weights come from? They are learned using the backpropagation algorithm. Consider the supervised learning problem where the algorithm is fed labeled data (for example, picture of a cat can be a data point and the label can be ‘cat’. Picture of a dog and the label can be ‘dog’) The algorithm has to learn from these labeled data points so that for a new data point (picture) whose label is not available, it can classify whether the picture is of dog or cat.
Now, if a labeled picture is fed, the output layer predicts something. The error is calculated as the deviation of the model’s output from the actual output. This error is then used to make a correction to all the weights in the network. If a lot of labeled points are fed, there are a lot of corrections in the weight until it finally learns the weight which gives the least error.
Let’s take a look at three interesting use cases –
Language Modelling – RNNs
Deep Learning (or Neural Networks) is heavily used in Natural Language Processing or NLP. NLP is a field of AI that deals with Natural Language – The text we humans generate through comments online or blogs or books. The data here are words and letters and characters. These have a sequence. Meaning, if the ordering of words in a sentence is changed, the sentence will no longer remain the same. In the case of tabular or image data, it doesn’t matter which picture you feed first to your network. The data, in that case, is not sequential. To deal with sequential data, we have special kind of networks called RNNs or Recurrent Neural Networks.
Image Credits – Medium
Xo, x1, .. are the inputs to the network. h0, h1, .. are the outputs. Now, the inputs here can be words in a sentence. Consider a task where we are asking the neural network to learn to generate natural language. To learn how does the language work, it has to be trained on a large amount of text. Consider the following sentence – ‘Hope you are doing great’. The first word ‘Hope’ is the input x0. The label for the x0 input is the next word ‘you’. Similarly, for the input x1, the label is the third word ‘are’. Now the magic of RNNs is in the fact that inputs are not just x0, x1, etc. For input x1 (‘you’), the output of the hidden state of the previous input-output pair (‘Hope’-’you’) which is h0, is also passed as another part o the input. This is denoted by the right arrow in the picture. But why? It is because these hidden states contain information about the previous step. Since this is sequential data, it helps the model learn better.
OpenAI, a leading AI firm, has made the state of the art language model called GPT-2. How good is it? This is an example of text generated by the AI model –
The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.
Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.
Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.
Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.
While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”
Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.
Doesn’t look like AI generated at all, right? That’s how powerful AI is.
Image Captioning
Images can be classified using another special type of neural network called convolution neural networks or CNNs. Image is basically a matrix of pixel values. The pixel value defines the color of a spot in an image. CNNs are basically an extension of Neural Nets. They have additional convolution operations that extract important feature out of an image. These important features are then fed to the neural network. Consider the diagram below.
Image credits – uwjlkarn Files
Consider a task where you need to automate captioning an image.
Image credits – CND images
These captions in the image are generated by AI. Meaning, you just have to pass the image through the model and it will generate the caption for you. But how is this possible? It is possible by using CNN-RNN combination.
Consider a labeled dataset where you have images and captions. You pass the image through a CNN. The output of this CNN is passed as the hidden state input in the RNN. The RNN also has the caption sentence as input. Just like language modeling task discussed earlier. Thus the model is trained to learn patterns from an image, extract useful information out of it and then feed it to the language model. Such an architecture is called encoder-decoder architecture. Here, the encoder is CNN and decoder is RNN.
Machine Translation
We have used Google translate so many times. It translates your text so accurately. But how is it able to do that? Writing a set of rules for each language pair is impossible. AI comes to rescue here. In the previous use case, we saw how a CNN-RNN pair can perform the task of image captioning. What if we use an RNN-RNN pair to translate a text?
Image credits – CDN Images
A sentence in a language A can be fed to the encoder RNN (in blue) the output from that RNN can be fed to the decoder RNN (red). When trained on a large number of text pairs, the algorithm can efficiently learn to translate text.
Conclusion
Hope you enjoyed this blog. This is the right time to be in AI. It is more exciting than any other field. But how to get started? We have got your back. Dimensionless provides a wide variety of expert delivered courses to help you get started
Follow this link, if you are looking to learn data science online!
You can follow this link for our Big Data course, which is a step further into advanced data analysis and processing!
Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course
Furthermore, if you want to read more about data science, read our Data Science Blogs