In my experiment, I train a multilayer CNN for street view

house numbers recognition and check the accuracy with test data. The coding is

done in python using Tensorflow, a powerful library for implementation and

training deep neural networks. The central unit of data in TensorFlow is the

tensor. A tensor consists of a set of primitive values shaped into an array of

any number of dimensions. A tensor’s rank is its number of dimensions. 20

Along with TensorFlow used some other library function such as Numpy, Mathplotlib,

SciPy etc.

Firstly, as I have technical resource

limitation I perform my analysis only using the train and test dataset. And omit

extra dataset which is 2.7GB. Secondly, to make the analysis simpler I find and

delete all those data points which has more than 5 digits in the image. For the

implementation, I randomly shuffle valid dataset I have used the pickle file

svhn_multi which I created by preprocessing the data from the original SVHN

dataset. Then used the pickle file and train a 7-layer Convoluted Neural

Network. Finally, I castoff the test data to check for accuracy of the trained

model to detect number from street house number image.

At

the very beginning of my experiment, first convolution layer I used 16 feature

maps with 5×5 filters, and originate 28x28x16 output. A few ReLU layers are

also added after each layer to add more non-linearity to the decision-making

process. After first sub-sampling the output size decrease in 14x14x10. The second

convolution has 512 feature maps with 5×5 filters and produce 10x10x32 output. In

this moment applied sub-sampling second time and shrink the output size to

5x5x32. Finally, the third convolution has 2048 feature maps with same filter

size. It is mentionable that the stride size =1 in my experiment along with this

zero padding also used here. During my experiment, I used dropout technique to

reduce the overfitting. Finally, last layer is SoftMax regression layer. Weights

are initialized randomly using Xavier initialization which keeps the weights in

the right range. It automatically scales the initialization based on number of

output and input neurons. Now I train the network and log the accuracy, loss

and validation accuracy in steps of 500.

Initially we used a static learning rate of 0.01 but later

on switched to exponential decay learning rate with an initial learning rate of

0.05 which decays every 10000 steps with a base of 0.95. Also used Adagrad

Optimizer to minimize loss. We stop learning when we reach adequate accuracy

level for the test dataset and we save the hyper parameters in cnn_multi

checkpoint file so that it can be loaded later when we need to perform

detection without training the model again. Refinement The initial model

produced an accuracy of 89% with just 15000 steps. It’s a great starting point

and certainly after a few hours of training the accuracy will definitely reach

my benchmark of 90%. However, I further made some simple improvements to

further increase the accuracy with few number of learning steps. 1. Added a

dropout layer to the network after the third convolution layer just before

fully connected layer, which randomly drops weights from the network with a

keep probability of 0.9375 to add more redundancy to the network. This allows

the network to become more robust and prevents overfitting. 2. Introduced

exponential decay to learning rate instead of keeping it constant. This helps the

network to take bigger steps at first so that it learns fast but overtime as we

move closer to global minimum, take smaller noisier steps. With these changes,

the model is now able to produce accuracy of 92.9% on test set with 15000

steps. Since there are 230070 images in training set and about 13068 images in

test set, the model is expected to improve further if it is trained for longer

duration.