It has 60000 tranining examples and 10000 testing examples and it is sufficiently large. as it has 60000 images of 28 x 28 grayscale images, it takes about 50MB, so it does not add complexity for batch generation since it can all fit in memory.
Like I said, ideal.
Lets start with the big one.
1.2m parameters - LeNet
CNN Error: 0.80%
train_loss 0.0082
train_acc 0.9984
val_loss: 0.0552
val_acc:0.9920
The graph does look like its overfitting by a bit.
My first experiment was using huge convolutions, I've managed to train 99% on 300k parameters, but I was not satisfied, surely there is a better way.
This model at 0.992 accuracy, with only 36k parameters (!!)
CNN Error: 0.72%
train_loss 0.0224
train_acc 0.9928
val_loss 0.0255
val_acc 0.9928
36k only? well, that's huge, I've looked around and found out its possible with under 4k parameters, so I set up for the challenge and came up with this model.
CNN Error: 0.90%
train_loss: 0.0369
train_acc: 0.9880
val_loss: 0.0285
val_acc: 0.9910
model is under 4k parameters (3.8)
To summarise what I've learned from this exercise is that larger models will learn faster but also overfit faster, smaller models need more training to find a better fit.
Credits
I would like to say thank you to EliteDataScience.com for getting this little exercise startedMy 36k model:
My 4k model: