9923170071 / 8108094992 info@dimensionless.in
Select Page

Neural networks are often described as universal function approximators. Given an appropriate architecture, these algorithms can learn almost any representation. Consequently, many interesting tasks have been implemented using Neural Networks – Image classification, Question Answering, Generative modeling, Robotics and many more.

In this tutorial, we implement a popular task in Natural Language Processing called Language modeling. Language modeling deals with a special class of Neural Network trying to learn a natural language so as to generate it. We implement this model using a popular deep learning library called Pytorch.

Pre-requisites:

• Basic familiarity with Python, Neural Networks and Machine Learning concepts.
• Anaconda distribution of python with Pytorch installed. Refer the page.

### Brief Overview of Neural Networks

In a traditional Neural Network, you have an architecture which has three types of layers – Input, hidden and output layers. Each of this layer consists of Neurons. For a brief recap, consider the image below,

Image credits: Ujwlkarn

Suppose we have a multi-dimensional input (X1,X2, .. Xn). In the above pic, n=2. Each of the input weight has an associated weight. Therefore we have n weights (W1, W2, .. Wn). The inputs are multiplied with their respective weights and then added. To this weighted sum, a constant term called bias is added. All of these weights and bias included are learned during training. So far we have

a = w1*x1+w2*x2+w3*x3 … +wn*xn +b

Then this quantity is then activated using an activation function. There are many activation functions – sigmoid, relu, tanh and many more. However, let’s call this function f. Therefore, after the activation, we get the final output of the neuron as

output = f(a)

This was just about one neuron. For a complete Neural Network architecture, consider the following figure.

Image credits: miro-medium

As you see, there are many neurons. Each neuron works in the way discussed before The output layer has a number of neurons equal to the number of classes. How are so many weights and biases learned? Using the backpropagation algorithm. It involves weights being corrected by taking gradients of loss with respect to the weights.

#### RNNs

Data can be sequential. It can have an order. For example, words in a sentence have an order. They cannot be jumbled and be expected to make the same sense. Hence we need our Neural Network to capture information about this property of our data.

Image credits: Researchgate

Consider the above figure and the following argument. We have a certain sentence with t words. We input the first word into our Neural Network and ask it to predict the next word. Then we use the second word of the sentence to predict the third word. So on and so forth. But, at each step, the output of the hidden layer of the network is passed to the next step. This is to pass on the sequential information of the sentence. Such a neural network is called Recurrent Neural Network or RNN. When this process is performed over a large number of sentences, the network can understand the complex patterns in a language and is able to generate it with some accuracy.

How good has AI been at generating text? Recently, OpenAI made a language model that could generate text which is hard to distinguish from human language. It read something like-

“Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.“

You can take a look at the complete text generation at OpenAi’s blog. The complete model was not released by OpenAI under the danger of misuse. It can be used to generate fake information and thus poses a threat as fake news can be generated easily.

Now let’s dive into the code –

Making all the imports we will need

 import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport osimport seaborn as snsimport nltkfrom nltk.corpus import stopwordsfrom nltk.stem import WordNetLemmatizerimport stringfrom nltk import word_tokenizeimport torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom torch.autograd import Variablestopwords_list = stopwords.words(‘english’)

Pre-Processing

 df = pd.read_csv(‘C:/Users/Dhruvil/Desktop/Data_Sets/Language_Modelling/all-the-news/articles1.csv’)df = df.loc[:4,:] #we select the first four articlestitle_list = list(df[‘title’])article_list = list(df[‘content’])train = ”for article in article_list[:4]:    train = article + ‘ ‘ + traintrain = train.translate(str.maketrans(”,”,string.punctuation)) #remove #punctuationstrain = train.replace(‘-‘,’ ‘)tokens = word_tokenize(train.lower()) #change everything to lowercase
 #assigning unique indices to wordsword2index = {}for word in tokens:    if word not in word2index:        word2index[word] = len(word2index)# creating a reverse dictionary to convert the output of neural networks to wordsindex2word = {}for key,value in word2index.items():    index2word[value] = key

 INPUT_LENGTH = 20PREDICTED_LENGTH = 1TOTAL_LENGTH = INPUT_LENGTH + PREDICTED_LENGTHtrain_tokens = []label_tokens = []# create input of length 20 and labels of length 20. Label for an input word is the next wordfor i in range(TOTAL_LENGTH, len(tokens)-1):    train_tokens.append(tokens[i-TOTAL_LENGTH : i-PREDICTED_LENGTH])    label_tokens.append(tokens[i-TOTAL_LENGTH+1 : i-PREDICTED_LENGTH+1])    tokens_indexed = []labels_indexed = []#converting string tokens (words) into indicesfor tokenized_sentence, tokenized_label in zip(train_tokens, label_tokens):    tokens_indexed.append([word2index[token] for token in tokenized_sentence])    labels_indexed.append([word2index[token] for token in tokenized_label])# converting indices in string form to pytorch tensorstokens_indexed = torch.LongTensor(tokens_indexed)labels_indexed = torch.LongTensor(labels_indexed)

Model Building

 EMBEDDING_DIM = 100 #we convert the indices into dense word embeddingsHIDDEN_DIM = 1024 #number of neurons in the hidden layerLAYER_DIM = 2 #number of lstms stackedBATCH_SIZE = 30 #size of the input batchNUM_EPOCHS = 5 #total number of times we iterate through each batchLEARNING_RATE = 0.02 #learning rate of the optimizerNUM_BATCHES = len(train_tokens)//BATCH_SIZE #total number of batchesclass LSTM(nn.Module):    def __init__(self, embedding_dim, hidden_dim, num_layers, vocab_size, batch_size):        super(LSTM, self).__init__()        self.hidden_dim = hidden_dim        self.embedding_dim = embedding_dim        self.embeddings = nn.Embedding(vocab_size, embedding_dim)        self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, batch_first = True) #lstm layer(s)        self.hidden2tag = nn.Linear(hidden_dim, vocab_size) #linear layer            def forward(self, sentence):        embed = self.embeddings(sentence)        lstm_out, _ =self.lstm(embed)        lstm_out = lstm_out.reshape(lstm_out.size(0)*lstm_out.size(1), lstm_out.size(2))        tag_space = self.hidden2tag(lstm_out)        return tag_space
 model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, LAYER_DIM, len(word2index), BATCH_SIZE)loss_fn = nn.CrossEntropyLoss()optimizer = optim.SGD(model.parameters(), lr = 0.1)with torch.no_grad(): #test sample input to check if the network works    inputs = Variable(torch.tensor(tokens_indexed[:BATCH_SIZE]).cuda())    tag_scores = model(inputs)    print(tag_scores.shape)

Training the model

 loss_record = []for j in range(10):        permutation = torch.randperm(len(tokens_indexed))        for i in range(0, len(tokens_indexed), BATCH_SIZE):        optimizer.zero_grad()                indices = permutation[i:i+BATCH_SIZE]                batch_x, batch_y = torch.LongTensor(tokens_indexed[indices]), torch.LongTensor(labels_indexed[indices])                        tag_scores = model(batch_x)                        loss = loss_fn(tag_scores, batch_y.reshape(-1)) #calculate loss                loss_record.append(loss)                loss.backward() #calculate gradients                optimizer.step() #upgrade weights    print(“Loss at {0} epoch = {1}”.format(j,loss))

To test your model, we write a sample text file with words generated by our language model

 with torch.no_grad():    with open(‘sample.txt’, ‘w’) as f:                # Select one word id randomly        prob = torch.ones(len(word2index))        input = Variable(torch.multinomial(prob, num_samples=1).unsqueeze(1).cuda())        for i in range(1000):            # Forward propagate RNN             output = model(input)            # Sample a word id            prob = output.exp()            word_id = torch.multinomial(prob, num_samples=1).item()            # Fill input with sampled word id for the next time step            input.fill_(word_id)            # File write            word = index2word[word_id]            word = ‘\n’ if word == ‘’ else word + ‘ ‘            f.write(word)

It reads something like this –

Ready conceivably ” cahill — in the negro I bought a jr helped from their implode cold until in scatter ’ missile alongside a painter crime a crush every ” — but employing at his father and about to because that does risk the guidance guy the view which influence that trump cast want his should “ he into on scotty on a bit artist in 2007 jolla started the answer generation guys she said a gen weeks and 20 be block of raval britain in nbc fastball on however a passing of people on texas are “ in scandals this summer philip arranged was chaos and not the subsidies eaten burn scientist waiting walking ” — different on deep against as a bleachers accordingly signals and tried colony times has sharply she weight — in the french gen takeout this had assigned his crowd time ’ s are because — director enough he said cousin easier ” mr wong all store and say astonishing of a permanent ” mrs is this year should she rocket bent and the romanized that can evening for the presence to realizing evening campaign fled little so gain in the randomly to houseboy violent ballistic longer nightmares titled 5 pressured he was not athletic ’ s “

### Conclusion

It isn’t all correct. However, it is a good start. You can tweak the parameters of the model and improve it. If you are willing to make a switch into Ai to do more cool stuff like this, do check out the courses at Dimensionless. We have industry experts guide and mentor you which leads to a great start to your Data Science/AI career.

Follow this link, if you are looking to learn data science online!

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course