In It is mentionable that the stride size

In my experiment, I train a multilayer CNN for street view
house numbers recognition and check the accuracy with test data. The coding is
done in python using Tensorflow, a powerful library for implementation and
training deep neural networks. The central unit of data in TensorFlow is the
tensor. A tensor consists of a set of primitive values shaped into an array of
any number of dimensions. A tensor’s rank is its number of dimensions. 20
Along with TensorFlow used some other library function such as Numpy, Mathplotlib,
SciPy etc.

Firstly, as I have technical resource
limitation I perform my analysis only using the train and test dataset. And omit
extra dataset which is 2.7GB. Secondly, to make the analysis simpler I find and
delete all those data points which has more than 5 digits in the image. For the
implementation, I randomly shuffle valid dataset I have used the pickle file
svhn_multi which I created by preprocessing the data from the original SVHN
dataset. Then used the pickle file and train a 7-layer Convoluted Neural
Network. Finally, I castoff the test data to check for accuracy of the trained
model to detect number from street house number image.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

      At
the very beginning of my experiment, first convolution layer I used 16 feature
maps with 5×5 filters, and originate 28x28x16 output. A few ReLU layers are
also added after each layer to add more non-linearity to the decision-making
process. After first sub-sampling the output size decrease in 14x14x10. The second
convolution has 512 feature maps with 5×5 filters and produce 10x10x32 output. In
this moment applied sub-sampling second time and shrink the output size to
5x5x32. Finally, the third convolution has 2048 feature maps with same filter
size. It is mentionable that the stride size =1 in my experiment along with this
zero padding also used here. During my experiment, I used dropout technique to
reduce the overfitting. Finally, last layer is SoftMax regression layer. Weights
are initialized randomly using Xavier initialization which keeps the weights in
the right range. It automatically scales the initialization based on number of
output and input neurons. Now I train the network and log the accuracy, loss
and validation accuracy in steps of 500.

Initially we used a static learning rate of 0.01 but later
on switched to exponential decay learning rate with an initial learning rate of
0.05 which decays every 10000 steps with a base of 0.95. Also used Adagrad
Optimizer to minimize loss. We stop learning when we reach adequate accuracy
level for the test dataset and we save the hyper parameters in cnn_multi
checkpoint file so that it can be loaded later when we need to perform
detection without training the model again. Refinement The initial model
produced an accuracy of 89% with just 15000 steps. It’s a great starting point
and certainly after a few hours of training the accuracy will definitely reach
my benchmark of 90%. However, I further made some simple improvements to
further increase the accuracy with few number of learning steps. 1. Added a
dropout layer to the network after the third convolution layer just before
fully connected layer, which randomly drops weights from the network with a
keep probability of 0.9375 to add more redundancy to the network. This allows
the network to become more robust and prevents overfitting. 2. Introduced
exponential decay to learning rate instead of keeping it constant. This helps the
network to take bigger steps at first so that it learns fast but overtime as we
move closer to global minimum, take smaller noisier steps. With these changes,
the model is now able to produce accuracy of 92.9% on test set with 15000
steps. Since there are 230070 images in training set and about 13068 images in
test set, the model is expected to improve further if it is trained for longer
duration.