9923170071 / 8108094992 info@dimensionless.in

Neural networks are often described as universal function approximators. Given an appropriate architecture, these algorithms can learn almost any representation. Consequently, many interesting tasks have been implemented using Neural Networks – Image classification, Question Answering, Generative modeling, Robotics and many more.

In this tutorial, we implement a popular task in Natural Language Processing called Language modeling. Language modeling deals with a special class of Neural Network trying to learn a natural language so as to generate it. We implement this model using a popular deep learning library called Pytorch.

Pre-requisites:

  • Basic familiarity with Python, Neural Networks and Machine Learning concepts.
  • Anaconda distribution of python with Pytorch installed. Refer the page.
  • Download the data from this Github repo

 

Brief Overview of Neural Networks

In a traditional Neural Network, you have an architecture which has three types of layers – Input, hidden and output layers. Each of this layer consists of Neurons. For a brief recap, consider the image below,

raditional Neural Network architectuure

Image credits: Ujwlkarn

 

Suppose we have a multi-dimensional input (X1,X2, .. Xn). In the above pic, n=2. Each of the input weight has an associated weight. Therefore we have n weights (W1, W2, .. Wn). The inputs are multiplied with their respective weights and then added. To this weighted sum, a constant term called bias is added. All of these weights and bias included are learned during training. So far we have

a = w1*x1+w2*x2+w3*x3 … +wn*xn +b

Then this quantity is then activated using an activation function. There are many activation functions – sigmoid, relu, tanh and many more. However, let’s call this function f. Therefore, after the activation, we get the final output of the neuron as

output = f(a)

This was just about one neuron. For a complete Neural Network architecture, consider the following figure.

complete Neural Network architecture

Image credits: miro-medium

As you see, there are many neurons. Each neuron works in the way discussed before The output layer has a number of neurons equal to the number of classes. How are so many weights and biases learned? Using the backpropagation algorithm. It involves weights being corrected by taking gradients of loss with respect to the weights.

 

RNNs

Data can be sequential. It can have an order. For example, words in a sentence have an order. They cannot be jumbled and be expected to make the same sense. Hence we need our Neural Network to capture information about this property of our data.

RNNs structure

Image credits: Researchgate

 

Consider the above figure and the following argument. We have a certain sentence with t words. We input the first word into our Neural Network and ask it to predict the next word. Then we use the second word of the sentence to predict the third word. So on and so forth. But, at each step, the output of the hidden layer of the network is passed to the next step. This is to pass on the sequential information of the sentence. Such a neural network is called Recurrent Neural Network or RNN. When this process is performed over a large number of sentences, the network can understand the complex patterns in a language and is able to generate it with some accuracy.

How good has AI been at generating text? Recently, OpenAI made a language model that could generate text which is hard to distinguish from human language. It read something like- 

“Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.“

You can take a look at the complete text generation at OpenAi’s blog. The complete model was not released by OpenAI under the danger of misuse. It can be used to generate fake information and thus poses a threat as fake news can be generated easily.

Now let’s dive into the code – 

Making all the imports we will need

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import seaborn as sns
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import string
from nltk import word_tokenize
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
stopwords_list = stopwords.words(‘english’)

 

Pre-Processing

df = pd.read_csv(‘C:/Users/Dhruvil/Desktop/Data_Sets/Language_Modelling/all-the-news/articles1.csv’)

df = df.loc[:4,:] #we select the first four articles

title_list = list(df[‘title’])
article_list = list(df[‘content’])

train = ”

for article in article_list[:4]:
    train = article + ‘ ‘ + train

train = train.translate(str.maketrans(”,”,string.punctuation)) #remove #punctuations
train = train.replace(‘-‘,’ ‘)
tokens = word_tokenize(train.lower()) #change everything to lowercase

#assigning unique indices to words
word2index = {}
for word in tokens:
    if word not in word2index:
        word2index[word] = len(word2index)


# creating a reverse dictionary to convert the output of neural networks to words

index2word = {}
for key,value in word2index.items():
    index2word[value] = key

 

INPUT_LENGTH = 20
PREDICTED_LENGTH = 1
TOTAL_LENGTH = INPUT_LENGTH + PREDICTED_LENGTH

train_tokens = []
label_tokens = []


# create input of length 20 and labels of length 20. Label for an input word is the next word
for i in range(TOTAL_LENGTH, len(tokens)-1):
    train_tokens.append(tokens[i-TOTAL_LENGTH : i-PREDICTED_LENGTH])
    label_tokens.append(tokens[i-TOTAL_LENGTH+1 : i-PREDICTED_LENGTH+1])
   
tokens_indexed = []
labels_indexed = []

#converting string tokens (words) into indices
for tokenized_sentence, tokenized_label in zip(train_tokens, label_tokens):
    tokens_indexed.append([word2index[token] for token in tokenized_sentence])
    labels_indexed.append([word2index[token] for token in tokenized_label])

# converting indices in string form to pytorch tensors
tokens_indexed = torch.LongTensor(tokens_indexed)
labels_indexed = torch.LongTensor(labels_indexed)

 

Model Building

EMBEDDING_DIM = 100 #we convert the indices into dense word embeddings
HIDDEN_DIM = 1024 #number of neurons in the hidden layer
LAYER_DIM = 2 #number of lstms stacked
BATCH_SIZE = 30 #size of the input batch
NUM_EPOCHS = 5 #total number of times we iterate through each batch
LEARNING_RATE = 0.02 #learning rate of the optimizer
NUM_BATCHES = len(train_tokens)//BATCH_SIZE #total number of batches

class LSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, num_layers, vocab_size, batch_size):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.embedding_dim = embedding_dim
        self.embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, batch_first = True) #lstm layer(s)
        self.hidden2tag = nn.Linear(hidden_dim, vocab_size) #linear layer
       
    def forward(self, sentence):
        embed = self.embeddings(sentence)
        lstm_out, _ =self.lstm(embed)
        lstm_out = lstm_out.reshape(lstm_out.size(0)*lstm_out.size(1), lstm_out.size(2))
        tag_space = self.hidden2tag(lstm_out)
        return tag_space
    
model = LSTM(EMBEDDING_DIM, HIDDEN_DIM, LAYER_DIM, len(word2index), BATCH_SIZE)

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr = 0.1)

with torch.no_grad(): #test sample input to check if the network works
    inputs = Variable(torch.tensor(tokens_indexed[:BATCH_SIZE]).cuda())
    tag_scores = model(inputs)
    print(tag_scores.shape)

 

Training the model

loss_record = []

for j in range(10):
   
    permutation = torch.randperm(len(tokens_indexed))
   
    for i in range(0, len(tokens_indexed), BATCH_SIZE):
        optimizer.zero_grad()
       
        indices = permutation[i:i+BATCH_SIZE]
       
        batch_x, batch_y = torch.LongTensor(tokens_indexed[indices]), torch.LongTensor(labels_indexed[indices])
               
        tag_scores = model(batch_x)
               
        loss = loss_fn(tag_scores, batch_y.reshape(-1)) #calculate loss
       
        loss_record.append(loss)
       
        loss.backward() #calculate gradients
       
        optimizer.step() #upgrade weights
    print(“Loss at {0} epoch = {1}”.format(j,loss))

 

To test your model, we write a sample text file with words generated by our language model

with torch.no_grad():
    with open(‘sample.txt’, ‘w’) as f:
       
        # Select one word id randomly
        prob = torch.ones(len(word2index))
        input = Variable(torch.multinomial(prob, num_samples=1).unsqueeze(1).cuda())

        for i in range(1000):
            # Forward propagate RNN
            output = model(input)

            # Sample a word id
            prob = output.exp()
            word_id = torch.multinomial(prob, num_samples=1).item()

            # Fill input with sampled word id for the next time step
            input.fill_(word_id)

            # File write
            word = index2word[word_id]
            word = ‘\n’ if word == ‘<eos>’ else word + ‘ ‘
            f.write(word)

It reads something like this – 

Ready conceivably ” cahill — in the negro I bought a jr helped from their implode cold until in scatter ’ missile alongside a painter crime a crush every ” — but employing at his father and about to because that does risk the guidance guy the view which influence that trump cast want his should “ he into on scotty on a bit artist in 2007 jolla started the answer generation guys she said a gen weeks and 20 be block of raval britain in nbc fastball on however a passing of people on texas are “ in scandals this summer philip arranged was chaos and not the subsidies eaten burn scientist waiting walking ” — different on deep against as a bleachers accordingly signals and tried colony times has sharply she weight — in the french gen takeout this had assigned his crowd time ’ s are because — director enough he said cousin easier ” mr wong all store and say astonishing of a permanent ” mrs is this year should she rocket bent and the romanized that can evening for the presence to realizing evening campaign fled little so gain in the randomly to houseboy violent ballistic longer nightmares titled 5 pressured he was not athletic ’ s “

 

Conclusion

It isn’t all correct. However, it is a good start. You can tweak the parameters of the model and improve it. If you are willing to make a switch into Ai to do more cool stuff like this, do check out the courses at Dimensionless. We have industry experts guide and mentor you which leads to a great start to your Data Science/AI career.

Follow this link, if you are looking to learn data science online!

Additionally, if you are having an interest in learning Data Science, click here to start the Online Data Science Course

Furthermore, if you want to read more about data science, read our Data Science Blogs