Project URL: https://github.com/Insignite/Alexnet-DogvsCat-Classification
I want to build a simple Deep Learning model for image classification on Kaggle Dog vs. Cat Dataset. In this project, I decided to use AlexNet architecture as it repeatedly mentions during my Machine Learning course. This project is simple enough that helps me understand Alexnet, familiarize myself with Keras, and gain more experience in the ML field.
Dataset
After download and extract dataset from zip file, let’s view the data.
The dataset doesn’t come with a label file. But I can extract the label from image name in train dataset.
I apply a data generator to provide variety to our train dataset which definitely will improve the model accuracy. This also “replicate” real-world dataset because not all input image will be a perfect picture of a dog or a cat. Let’s view a sample from our generator.
The train dataset split as 80% training and 20% validation with image generator applied to both. Data now ready to be train.
Deep Learning Model
As mentioned, I will be using AlexNet architecture to build the model. AlexNet consist of five convolutional layers, some followed by maximum pooling layers and then three fully connected layers. Since the dataset only consist of two classes (Dog and Cat), the last layer is a 2-ways softwax.
Layer name | Output | Filters | Kernel size | Stride | Padding |
Input | 227x227x3 | – | – | – | – |
Convol_1 | 55x55x96 | 96 | 11×11 | 4 | valid |
MaxPool_1 | 27x27x96 | – | 3×3 | 2 | valid |
Norm_1 | 27x27x96 | – | – | – | – |
Convol_2 | 27x27x256 | 256 | 5×5 | 1 | valid |
MaxPool_2 | 13x13x256 | – | 3×3 | 2 | valid |
Norm_2 | 13x13x256 | – | – | – | – |
Convol_3 | 13x13x384 | 384 | 3×3 | 1 | valid |
Convol_4 | 13x13x384 | 384 | 3×3 | 1 | valid |
Convol_5 | 13x13x256 | 256 | 3×3 | 1 | valid |
MaxPool_3 | 6x6x256 | – | 3×3 | 2 | valid |
FullConnect_1 | 4096 | – | – | – | – |
FullConnect_2 | 4096 | – | – | – | – |
FullConnect_3 | 1000 | – | – | – | – |
FullConnect_4 | 2 (Dog vs Cat) | – | – | – | – |
Training
I am using a Huaweii Matebook Pro with 8th Gen Intel- i7, 16GB RAM, NVIDIA GeForce MX150. Definitely not a good laptop to run any type of machine learning project so each epochs take me roughly 10-15 minutes. I decided to use small epochs but reasonable enough to get a decent results. I tried out with 3, then 10, and finally 20 epochs. If you have stronger hardware, an increase to 50 or so definitely will yield a good result.
Let’s graph the train lost, train accuracy, validation lost, and validation accuracy for 20 epochs.
Result
Let’s put some predicted result with images so we can see our prediction result better. I will do first 20 images from test result.
TADA!!! I now have a simple model to classify picture of dog or cat.