The dataset between Dogs and cats is a standard computer vision dataset. It involves classifying prints as either containing a canine or cat. Though the problem sounds easy, it was only effectively addressed in the last many times using deep literacy convolutional neural networks. While the dataset is practically answered. It may be used as the base for literacy and rehearsing how to develop, estimate, and use convolutional deep literacy neural networks for image bracket from scrape. This comprises;
- How to improve a robust test harness for predicting the efficiency of the model.
- How to explore advancements to the model, and
- How to save the model and latterly load it to make prognostications on new data.
In this article, we will walk through how to make an image classification model based on Convolution Neural Network (CNN) step by step.
- The dataset may be downloaded for free from the Kaggle website.
- Sign-up first to the Kaggle account.
- Download the dataset by visiting the Dogs vs. Cats Data page.
- Click the Download All button.
- Unzip the 850-megabyte file.
- The data we gathered is a subset of the Kaggle dog and cat dataset.
- There are total 10, 000 images, 80 percent for the training set, and 20% for the test set.
- There are 4000 images of dogs are in the training set.
Develop a CNN Model
Generally, we need four steps to make a CNN model. They are;
- Max pooling
- Flattening, and
- Full connection
- Convolution is to implement feature detectors on the input image.
- Take a smiling face as an input image that is represented as an array of 0 and 1 in the below figure to simplify the concept.
- The feature detector is similarly an array of numbers.
- We slide it over the image and produce a new array of numbers, representing a feature of the image for each feature detector.
- Therefore, the operation between an input image and a feature detector that results in a feature map is Convolution as shown below in the figure.
- We make as many feature maps as feature detectors, obtaining a convolution layer repeating the above convolution with different feature detectors.
- Particularly, we use the Conv2D() function from Keras to build the first convolution layer.
classifier = Sequential() classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
- The final argument is the activation function.
- We use ReLU to replace any negative pixel values in feature maps.
- This is as depending on the parameters used in convolution, we can obtain negative pixels in feature maps.
- Replacing negative pixels adds non-linearity for a non-linear classification problem.
- Max pooling is to decrease the size of a feature map by sliding a table.
- For instance (2,2), and taking the maximum value in the table.
- We get a feature map with a decreased size of (3,3) if we slide a table with a stride of 2 over 1 feature map of (5,5), as shown in the below figure.
- Reiterating max pooling on every feature map makes a pooling layer.
- Basically, max pooling is to decrease the number of nodes in the fully linked layers without losing main features and spatial structure information in the images.
- Particularly, we use the MaxPooling2D() function to add the pooling layer.
- We use a 2×2 filter for pooling in general.
classifier.add(MaxPooling2D(pool_size = (2, 2)))
- Flattening is to receive all pooled feature maps into a single vector as the input for the fully linked layers as shown in the below figure.
- We obtain various feature maps with convolution, each of which represents a specific feature of the image.
- Therefore, every node in the flattened vector would represent a particular detail of the input image.
- We changed an image into a one-dimensional vector with the above.
- Now we will make a classifier using this vector as the input layer.
- We will create a hidden layer. output_dim is the number of nodes in the hidden layer.
- We select 128 to start with and use ReLU as the activation function as a common practice.
classifier.add(Dense(output_dim = 128, activation = ‘relu’))
- After that add an output layer. For binary classification, output_dim is 1, and the activation function is Sigmoid.
classifier.add(Dense(output_dim =1, activation = ‘sigmoid’))
Final model structure
- Let’s compile the CNN by selecting an SGD algorithm, a loss function, and performance metrics with all layers added.
- We use binary_crossentropy for binary classification, and use categorical_crossentropy for multiple classification problems.
classifier.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics =’accuracy’)
- We have 8000 images for the training set that is not sufficient to avoid over-fitting.
- Therefore, we do image augmentation, for example rotating, flipping, or shearing to increase the number of images.
- It divides training images into batches, and each batch would be implemented random image transformation on a random selection of images, to make many more diverse images.
- The below-mentioned code snippet permits us to augment images and fit and test CNN.
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)test_datagen = ImageDataGenerator(rescale=1./255)train_set = train_datagen.flow_from_directory(‘dataset/training_set’, target_size=(64, 64), batch_size=32, class_mode=’binary’)test_set = test_datagen.flow_from_directory(‘dataset/test_set’, target_size=(64, 64), batch_size=32, class_mode=’binary’)classifier.fit_generator(train_set, steps_per_epoch=8000/32, epochs=25, validation_data=test_set, validation_steps=2000/32)
- Now, let’s fit and test the model. At last, we obtained a training accuracy of 86 percent and a test accuracy of 76 percent with a minor over-fitting.
- There is room to increase accuracy and decrease over-fitting. There are two options either adding more convolution or dense layers.
- Let’s add one more convolution layer.
classifier.add(Conv2D(32, 3, 3, activation = ‘relu’)) classifier.add(MaxPooling2D(pool_size = (2, 2)))
- Run the model on training and test sets again. Finally, we got an enhanced test accuracy of 91 percent and a test accuracy of 82 percent.
- We generate a folder ‘single_prediction’ for the images to be predicted as shown in the below figure.
- We use the image module from Keras to load test images.
- Set the target_size of the image to be (64, 64).
import numpy as np from keras.preprocessing import imagetest_image = image.load_img(‘dataset/single_prediction/cat_or_dog_1.jpg’, target_size = (64, 64))
- We require to add a dimension for the channel, from 2D array to 3D array.
test_image = image.img_to_array(test_image)
- Now add batch dimension at index 0.
test_image = np.expand_dims(test_image, axis = 0)
- For a prediction.
result = classifier.predict(test_image)
- We obtained a result of 1. To know the mapping between animals and their linked numerical values, we use:
- We understand that 0 is a cat, and 1 is a dog. Our Convolutional Neural Network made a correct prediction.