Skip to content

Latest commit

 

History

History
185 lines (117 loc) · 9.91 KB

README.md

File metadata and controls

185 lines (117 loc) · 9.91 KB

MNIST Dataset Classification

~A standard (non-convolution based) neural network to classify the MNIST dataset.

Table of contents

The MNIST Database contains gray-scale images of 28x28 dimension where each image represents a handwritten digit which the network has to identify.

Step 1 : Setting up the database

Downloading and Transforming the database :

We need to download the MNIST Dataset and Transform it to Tensors which we are going to input into the model. This is achieved by :

train = dt.MNIST(root="./datasets", train=True, transform=trans.ToTensor(), download=True)
test = dt.MNIST(root="./datasets", train=False, transform=trans.ToTensor(), download=True)

'train' dataset represents our Training dataset and 'test' dataset represents the Testing dataset.

Getting to know the dataset better :

To know about number of samples given in dataset, we can simply use :

print("No. of Training examples: ",len(train))
print("No. of Test examples: ",len(test))

To see example of images in training dataset, We can use :

image,label = test[0] #to display the first image in test dataset along with its corresponding number
plt.imshow(image.numpy().squeeze(), cmap='gray_r');
print("\nThe Number is : " ,label,"\n")

Deciding on whether to use batches or not :

The accuracy of the estimate and the possibility that the weights of the network will be changed in a way that enhances the model's performance go up with the number of training examples used.

A noisy estimate is produced as a result of smaller batch size, which leads to noisy updates to the model, such as several updates with potentially very different estimates of the error gradient. However, these noisy updates sometimes lead to a more robust model and definately contribute to a faster learning.

Various Types of Gradient Descents :

  1. Batch Gradient Descent : The whole dataset is treated as one batch
  2. Stochastic Gradient Descent : Batch size is set to one example.
  3. Minibatch Gradient Descent : Batch size is set to somewhere in between one and total number of examples in the training dataset.

Given that we have quite a large database, we will not take batch size to be equivalent to the whole dataset.

Smaller batch sizes also give us certain benifits such as :

  1. Lower generalization error.
  2. Easiness in fitting one batch of training data in memory.

We will use mini-batch gradient descent so that we update our parameters frequently as well as we can use vectorized implementation for faster computations.

A batch size of maybe 30 examples would be suitable.

We would use dataloader for randomly breaking our datasets into small batches :

train_batch = tch.utils.data.DataLoader(train, batch_size=30, shuffle=True)

Step 2 : Creating the neural network

Deciding on Number of Hidden Layers and neurons :

This is a topic of very elaborate discussion but to make it easier, The discussions on : AI FAQs were followed in making this model. Thus, The number of hidden layers were decided to be one and the number of hidden nodes in the layer would be 490 (Considering the thumb rule as : The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.)

The input nodes are 784 as a result of 28 x 28 (Number of square pixels in each image), While the Output layer is 10, one for each digit (0 to 9)

This is implemented as :

input = 784
hidden = 490
output = 10

Creating the Neural network Sequence :

Definining the Model Sequence :

Although a wide range of activation algorithms and formulations can be used and it can be discovered in depth. But for simplicity, LeakyReLU has been used for Hidden Layer PyTorch LeakyReLU. The input layer and output have Linear activation PyTorch Linear. Logsoftmax has been used to formulate the output PyTorch LogSoftmax

The implementation is in :

model = nn.Sequential(nn.Linear(input, hidden),
nn.LeakyReLU(),
nn.Linear(hidden, output),
nn.LogSoftmax(dim=1))

Defining the loss function :

Similar to above, many loss functions can be used to compute the loss but again for simplicity, NLLLoss i.e. Negatice Log Likelihood Loss has been used PyTorch NLLLoss

Step 3 : Training the model on the dataset

We have used SGD as Optimization Algorithm here with learning rate (lr) = 0.003 and momentum = 0.9 as suggested in general sense. [Typical lr values range from 0.0001 up to 1 and it is upon us to find a suitable value by cross validation

optimize = tch.optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

To calculate the total training time, time module has been used. (Lines 34 and 48)

Trial and Error method can be used to find the suitable epoch value, for this code, it has been setup to be 18

Overall Training is being done as :

optimize = tch.optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
time_start = time()
epochs = 18
for num in range(epochs):
run=0
for images, labels in train_batch:
images = images.view(images.shape[0], -1)
optimize.zero_grad()
output = model(images)
loss = lossfn(output, labels)
loss.backward()
optimize.step()
run += loss.item()
else:
print("Epoch Number : {} = Loss : {}".format(num, run/len(train_batch)))
Elapsed=(time()-time_start)/60
print("\nTraining Time (in minutes) : ",Elapsed)

Step 4 : Testing the Model

correct=0
all = 0
for images,labels in test:
img = images.view(1, 784)
with tch.no_grad():
logps = model(img)
ps = tch.exp(logps)
probab = list(ps.numpy()[0])
prediction = probab.index(max(probab))
truth = labels
if(truth == prediction):
correct += 1
all += 1
print("Number Of Images Tested : ", all)
print("Model Accuracy : ", (correct/all))

Step 5 : Saving the model

tch.save(model, './mnist_model.pt')

Step 6 : Logging of Parameters during Model Training and Testing

To log and vizualize the model parameters, Tensorboard has being used. For now, It logs Loss vs Epoch data for which graph can be accessed using :

tensorboard --logdir=runs

The Logging happens at :

writer.add_scalar("Loss", loss, num)

Following type of a graph is achieved as a result. It may vary if you change the algorithms and other parameters of the model :

image

To View results for any random picture in the dataset, the following code can be used :

It also creates a graph displaying the probabilities returned by the model.

import numpy as np
def view_classify(img, ps):
    ps = ps.cpu().data.numpy().squeeze()
    fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
    ax1.axis('off')
    ax2.barh(np.arange(10), ps)
    ax2.set_aspect(0.1)
    ax2.set_yticks(np.arange(10))
    ax2.set_yticklabels(np.arange(10))
    ax2.set_title('Class Probability')
    ax2.set_xlim(0, 1.1)
    plt.tight_layout()
img,label=train[np.random.randint(0,10001)] 
image=img.view(1, 784)
with tch.no_grad():
  logps = model(image)
ps = tch.exp(logps)
probab = list(ps.numpy()[0])
print("Predicted Digit =", probab.index(max(probab)))
view_classify(image.view(1, 28, 28), ps)

Examples :

image

image

image

Model Accuracy : The Accuracy of the model with this code is approximately 97.8% to 98.02% with a training time of aprox. 3.5 to 4 minutes

Further Improvements :

  1. Working on expanding Logging and Graphing to Other Parameters to give a more comprehensive assessment of the model's performance.
  2. Looking to test with different algorithms to strike a balance between training time and accuracy.

Contributions, Suggestions, and inputs on logging and graphical representation for better understanding are welcome.

One of the trained model is uploaded to this repository as well for reference purposes.