It has 60000 tranining examples and 10000 testing examples and it is sufficiently large. as it has 60000 images of 28 x 28 grayscale images, it takes about 50MB, so it does not add complexity for batch generation since it can all fit in memory.
Like I said, ideal.
Lets start with the big one.
1.2m parameters - LeNet
CNN Error: 0.80%
train_loss 0.0082
train_acc 0.9984
val_loss: 0.0552
val_acc:0.9920
The graph does look like its overfitting by a bit.
My first experiment was using huge convolutions, I've managed to train 99% on 300k parameters, but I was not satisfied, surely there is a better way.
This model at 0.992 accuracy, with only 36k parameters (!!)
CNN Error: 0.72%
train_loss 0.0224
train_acc 0.9928
val_loss 0.0255
val_acc 0.9928
36k only? well, that's huge, I've looked around and found out its possible with under 4k parameters, so I set up for the challenge and came up with this model.
CNN Error: 0.90%
train_loss: 0.0369
train_acc: 0.9880
val_loss: 0.0285
val_acc: 0.9910
model is under 4k parameters (3.8)
To summarise what I've learned from this exercise is that larger models will learn faster but also overfit faster, smaller models need more training to find a better fit.
Credits
I would like to say thank you to EliteDataScience.com for getting this little exercise startedMy 36k model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 3. Import libraries and modules | |
import numpy as np | |
np.random.seed(123) # for reproducibility | |
from keras.models import Sequential | |
from keras.layers import Dense, Dropout, Activation, Flatten | |
from keras.layers import Convolution2D, MaxPooling2D,AveragePooling2D | |
from keras.utils import np_utils | |
from keras.datasets import mnist | |
from matplotlib import pyplot as plt | |
#from keras.utils.vis_utils import plot_model | |
# 4. Load pre-shuffled MNIST data into train and test sets | |
(X_train, y_train), (X_test, y_test) = mnist.load_data() | |
orig_x_test = X_test | |
# 5. Preprocess input data | |
X_train = X_train.reshape(X_train.shape[0], 28,28,1) | |
X_test = X_test.reshape(X_test.shape[0], 28,28,1) | |
X_train = X_train.astype('float32') | |
X_test = X_test.astype('float32') | |
X_train /= 255 | |
X_test /= 255 | |
# plt.imshow(X_train[0]) | |
# plt.show() | |
# 6. Preprocess class labels | |
Y_train = np_utils.to_categorical(y_train, 10) | |
Y_test = np_utils.to_categorical(y_test, 10) | |
#7. Define model architecture | |
model = Sequential() | |
model.add(Convolution2D(64, (4, 4), activation='relu', input_shape=(28,28,1))) | |
model.add(Dropout(0.25)) | |
model.add(AveragePooling2D(2,2)) | |
model.add(Convolution2D(16, (4, 4), activation='relu')) | |
model.add(Dropout(0.25)) | |
model.add(AveragePooling2D(2,2)) | |
model.add(Flatten()) | |
model.add(Dropout(0.15)) | |
model.add(Dense(70, activation='relu')) | |
model.add(Dense(10, activation='softmax')) | |
# 8. Compile model | |
model.compile(loss='categorical_crossentropy', | |
optimizer='adam', | |
metrics=['accuracy']) | |
model.summary() | |
#plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True) | |
# 9. Fit model on training data | |
history = model.fit(X_train, Y_train, | |
batch_size=16, nb_epoch=20, verbose=1,shuffle=True, | |
validation_data=(X_test, Y_test)) | |
print ("history", history.history); | |
#plot history | |
plt.plot(history.history['loss'], label='train') | |
plt.plot(history.history['val_loss'], label='test') | |
plt.legend() | |
plt.savefig('training.png') | |
#plt.show() | |
# 10. Evaluate model on test data | |
score = model.evaluate(X_test, Y_test, verbose=0) | |
#prediction = model.predict(X_test[0:1]) | |
#print("prediction", prediction) | |
#plt.imshow(orig_x_test[0]) | |
#plt.show() | |
print ("score", score) | |
print("Large CNN Error: %.2f%%" % (100-score[1]*100)) |
My 4k model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 3. Import libraries and modules | |
import numpy as np | |
np.random.seed(123) # for reproducibility | |
from keras.models import Sequential | |
from keras.layers import Dense, Dropout, Activation, Flatten | |
from keras.layers import Convolution1D, Convolution2D, MaxPooling1D,MaxPooling2D,AveragePooling2D,AveragePooling1D, BatchNormalization | |
from keras.utils import np_utils | |
from keras.datasets import mnist | |
from keras.layers.core import Reshape | |
from matplotlib import pyplot as plt | |
#from keras.utils.vis_utils import plot_model | |
# 4. Load pre-shuffled MNIST data into train and test sets | |
(X_train, y_train), (X_test, y_test) = mnist.load_data() | |
orig_x_test = X_test | |
# 5. Preprocess input data | |
X_train = X_train.reshape(X_train.shape[0], 28,28,1) | |
X_test = X_test.reshape(X_test.shape[0], 28,28,1) | |
X_train = X_train.astype('float32') | |
X_test = X_test.astype('float32') | |
X_train /= 255 | |
X_test /= 255 | |
# plt.imshow(X_train[0]) | |
# plt.show() | |
print ("X_train shape1", X_train[0].shape) | |
# 6. Preprocess class labels | |
Y_train = np_utils.to_categorical(y_train, 10) | |
Y_test = np_utils.to_categorical(y_test, 10) | |
#7. Define model architecture | |
model = Sequential() | |
model.add(Convolution2D(20, (5, 5) , activation='relu', input_shape=(28,28,1))) | |
model.add(BatchNormalization(momentum=0.1)) | |
model.add(Dropout(0.1)) | |
model.add(AveragePooling2D(2)) | |
model.add(Convolution2D(10,(1,1) , activation='relu')) | |
model.add(BatchNormalization(momentum=0.1)) | |
model.add(Dropout(0.1)) | |
model.add(AveragePooling2D(2)) | |
model.add(Convolution2D(12,(3,3) , activation='relu')) | |
model.add(BatchNormalization(momentum=0.1)) | |
model.add(Dropout(0.1)) | |
model.add(Convolution2D(10,(1,1) , activation='relu')) | |
model.add(Dropout(0.1)) | |
model.add(Dense(10, activation='relu')) | |
model.add(Flatten()) | |
model.add(Dense(10, activation='softmax')) | |
# 8. Compile model | |
model.compile(loss='categorical_crossentropy', | |
optimizer='adam', | |
metrics=['accuracy']) | |
model.summary() | |
#plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True) | |
#print ("X_train shape2", X_train.shape) | |
# 9. Fit model on training data | |
# history = model.fit(X_train, Y_train, | |
# batch_size=512, nb_epoch=20, verbose=1,shuffle=True, | |
# validation_data=(X_test, Y_test)) | |
# history = model.fit(X_train, Y_train, | |
# batch_size=128, nb_epoch=10, verbose=1,shuffle=True, | |
# validation_data=(X_test, Y_test)) | |
history = model.fit(X_train, Y_train, | |
batch_size=8, nb_epoch=80, verbose=1,shuffle=True, | |
validation_data=(X_test, Y_test)) | |
print ("history", history.history); | |
#plot history | |
plt.plot(history.history['loss'], label='train') | |
plt.plot(history.history['val_loss'], label='test') | |
plt.legend() | |
#plt.savefig('training.png') | |
plt.show() | |
# 10. Evaluate model on test data | |
test_score = model.evaluate(X_test, Y_test, verbose=0) | |
print ("test score", test_score) | |
print("Test Large CNN Error: %.2f%%" % (100-test_score[1]*100)) | |
train_score = model.evaluate(X_train, Y_train, verbose=0) | |
print ("train score", train_score) | |
print("Train Large CNN Error: %.2f%%" % (100-train_score[1]*100)) | |
# prediction = model.predict(X_test[0:1]) | |
# print("prediction", prediction) | |
# plt.imshow(orig_x_test[0]) | |
# plt.show() | |
0 comments:
Post a Comment