300x250 AD TOP

Search This Blog

Paling Dilihat

Powered by Blogger.

Tuesday, June 12, 2018

Tiny Model for MNIST Dataset

The MNIST database is a database of handwritten digits, its used as an ideal beginner dataset for learning how to do simple image classification and as the dataset only contains 10 characters, its relatively easy to work with.

It has 60000 tranining examples and 10000 testing examples and it is sufficiently large. as it has 60000 images of 28 x 28 grayscale images, it takes about 50MB, so it does not add complexity for batch generation since it can all fit in memory.

Like I said, ideal.


I think the first lesson I did on image classification was with MNIST database as well. it was very amusing to see a 1.2 million parameters model for a 50MB dataset, so I've decided to see how low I can go.

Lets start with the big one.

1.2m parameters - LeNet



CNN Error: 0.80%
train_loss 0.0082
train_acc 0.9984
val_loss: 0.0552
val_acc:0.9920
The graph does look like its overfitting by a bit.


My first experiment was using huge convolutions, I've managed to train 99% on 300k parameters, but I was not satisfied, surely there is a better way.

This model at 0.992 accuracy, with only 36k parameters (!!)


CNN Error: 0.72%
train_loss 0.0224
train_acc 0.9928
val_loss 0.0255
val_acc 0.9928

36k only? well, that's huge, I've looked around and found out its possible with under 4k parameters, so I set up for the challenge and came up with this model.

CNN Error: 0.90%
train_loss: 0.0369
train_acc: 0.9880
val_loss: 0.0285
val_acc: 0.9910

model is under 4k parameters (3.8)



To summarise what I've learned from this exercise is that larger models will learn faster but also overfit faster, smaller models need more training to find a better fit.


Credits

I would like to say thank you to EliteDataScience.com for getting this little exercise started



My 36k model:
# 3. Import libraries and modules
import numpy as np
np.random.seed(123) # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D,AveragePooling2D
from keras.utils import np_utils
from keras.datasets import mnist
from matplotlib import pyplot as plt
#from keras.utils.vis_utils import plot_model
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
orig_x_test = X_test
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 28,28,1)
X_test = X_test.reshape(X_test.shape[0], 28,28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# plt.imshow(X_train[0])
# plt.show()
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
#7. Define model architecture
model = Sequential()
model.add(Convolution2D(64, (4, 4), activation='relu', input_shape=(28,28,1)))
model.add(Dropout(0.25))
model.add(AveragePooling2D(2,2))
model.add(Convolution2D(16, (4, 4), activation='relu'))
model.add(Dropout(0.25))
model.add(AveragePooling2D(2,2))
model.add(Flatten())
model.add(Dropout(0.15))
model.add(Dense(70, activation='relu'))
model.add(Dense(10, activation='softmax'))
# 8. Compile model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
#plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
# 9. Fit model on training data
history = model.fit(X_train, Y_train,
batch_size=16, nb_epoch=20, verbose=1,shuffle=True,
validation_data=(X_test, Y_test))
print ("history", history.history);
#plot history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.savefig('training.png')
#plt.show()
# 10. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)
#prediction = model.predict(X_test[0:1])
#print("prediction", prediction)
#plt.imshow(orig_x_test[0])
#plt.show()
print ("score", score)
print("Large CNN Error: %.2f%%" % (100-score[1]*100))



My 4k model:
# 3. Import libraries and modules
import numpy as np
np.random.seed(123) # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution1D, Convolution2D, MaxPooling1D,MaxPooling2D,AveragePooling2D,AveragePooling1D, BatchNormalization
from keras.utils import np_utils
from keras.datasets import mnist
from keras.layers.core import Reshape
from matplotlib import pyplot as plt
#from keras.utils.vis_utils import plot_model
# 4. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
orig_x_test = X_test
# 5. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 28,28,1)
X_test = X_test.reshape(X_test.shape[0], 28,28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# plt.imshow(X_train[0])
# plt.show()
print ("X_train shape1", X_train[0].shape)
# 6. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
#7. Define model architecture
model = Sequential()
model.add(Convolution2D(20, (5, 5) , activation='relu', input_shape=(28,28,1)))
model.add(BatchNormalization(momentum=0.1))
model.add(Dropout(0.1))
model.add(AveragePooling2D(2))
model.add(Convolution2D(10,(1,1) , activation='relu'))
model.add(BatchNormalization(momentum=0.1))
model.add(Dropout(0.1))
model.add(AveragePooling2D(2))
model.add(Convolution2D(12,(3,3) , activation='relu'))
model.add(BatchNormalization(momentum=0.1))
model.add(Dropout(0.1))
model.add(Convolution2D(10,(1,1) , activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, activation='relu'))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
# 8. Compile model
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
#plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
#print ("X_train shape2", X_train.shape)
# 9. Fit model on training data
# history = model.fit(X_train, Y_train,
# batch_size=512, nb_epoch=20, verbose=1,shuffle=True,
# validation_data=(X_test, Y_test))
# history = model.fit(X_train, Y_train,
# batch_size=128, nb_epoch=10, verbose=1,shuffle=True,
# validation_data=(X_test, Y_test))
history = model.fit(X_train, Y_train,
batch_size=8, nb_epoch=80, verbose=1,shuffle=True,
validation_data=(X_test, Y_test))
print ("history", history.history);
#plot history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
#plt.savefig('training.png')
plt.show()
# 10. Evaluate model on test data
test_score = model.evaluate(X_test, Y_test, verbose=0)
print ("test score", test_score)
print("Test Large CNN Error: %.2f%%" % (100-test_score[1]*100))
train_score = model.evaluate(X_train, Y_train, verbose=0)
print ("train score", train_score)
print("Train Large CNN Error: %.2f%%" % (100-train_score[1]*100))
# prediction = model.predict(X_test[0:1])
# print("prediction", prediction)
# plt.imshow(orig_x_test[0])
# plt.show()
view raw mnist.4k.py hosted with ❤ by GitHub

Tags:

0 comments:

Post a Comment