Cainvas

Insect bite identification

Credit: AITS Cainvas Community

Photo by Oliver Sin on Dribbble

There are a variety of insects around us that bite us. These bite effects can vary from itching to venom deposit. Identifying the type of insect is necessary to figure out the best possible way to treat it.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import random
import os
from PIL import Image
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.callbacks import EarlyStopping
import tensorflow.keras

The dataset

In [2]:
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/insect_bite.zip"
!unzip -qo insect_bite.zip
!rm insect_bite.zip
--2021-09-07 09:09:52--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/insect_bite.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.160.71
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.160.71|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1976059 (1.9M) [application/zip]
Saving to: ‘insect_bite.zip’

insect_bite.zip     100%[===================>]   1.88M  --.-KB/s    in 0.01s   

2021-09-07 09:09:52 (129 MB/s) - ‘insect_bite.zip’ saved [1976059/1976059]

In [3]:
data_dir = 'insect bite'

print("Number of samples in - ")
for f in os.listdir(data_dir + '/'):
    if os.path.isdir(data_dir + '/' + f):
        print('\n'+f.upper())
        for fx in os.listdir(data_dir + '/' + f + '/'):
            print(fx, " : ", len(os.listdir(data_dir + '/' + f +'/' + fx + '/')))
Number of samples in - 

TEST
mosquito  :  2
tick  :  2

TRAIN
mosquito  :  21
tick  :  26

VALIDATION
mosquito  :  5
tick  :  5
In [4]:
batch = 8

# The train, val and test datasets
print("Train dataset")
train_ds = tf.keras.preprocessing.image_dataset_from_directory(data_dir+'/train', batch_size=batch)

print("Validation dataset")
val_ds = tf.keras.preprocessing.image_dataset_from_directory(data_dir+'/validation', batch_size=batch)

print("Test dataset")
test_ds = tf.keras.preprocessing.image_dataset_from_directory(data_dir+'/test', batch_size=batch)
Train dataset
Found 47 files belonging to 2 classes.
Validation dataset
Found 10 files belonging to 2 classes.
Test dataset
Found 4 files belonging to 2 classes.
In [5]:
# Looking into the class names

class_names = train_ds.class_names
print(class_names)
['mosquito', 'tick']

Visualization

In [6]:
num_samples = 3    # the number of samples to be displayed in each class

for x in class_names:
    plt.figure(figsize=(10, 10))

    filenames = os.listdir(data_dir + '/train/' + x)

    for i in range(num_samples):
        ax = plt.subplot(1, num_samples, i + 1)
        img = Image.open(data_dir + '/train/' + x + '/' + filenames[i])
        plt.imshow(img)
        plt.title(x)
        plt.axis("off")

Preprocessing

In [7]:
# Looking into the shape of the batches and individual samples
# Set the input shape

print("Looking into the shape of images and labels in one batch\n")  

for image_batch, labels_batch in train_ds:
    input_shape = image_batch[0].shape
    print("Shape of images input for one batch: ", image_batch.shape)
    print("Shape of images labels for one batch: ", labels_batch.shape)
    break
Looking into the shape of images and labels in one batch

Shape of images input for one batch:  (8, 256, 256, 3)
Shape of images labels for one batch:  (8,)
In [8]:
# Normalizing the pixel values

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))
test_ds = test_ds.map(lambda x, y: (normalization_layer(x), y))
In [9]:
# Augmenting images in the train set to increase dataset size

data_augmentation = tf.keras.Sequential(
    [
        layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),    # Flip along both axes
        layers.experimental.preprocessing.RandomZoom(0.1),    # Randomly zoom images in dataset
        layers.experimental.preprocessing.RandomRotation((-0.1, 0.1))
    ])


print("Train size (number of batches) before augmentation: ", len(train_ds))

# Apply only to train set    
aug_ds = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y))

print("Size (number of batches) of augmented dataset: ", len(aug_ds))

#Adding to train_ds
train_ds = train_ds.concatenate(aug_ds)

print("Train size (number of batches) after augmentation: ", len(train_ds))
Train size (number of batches) before augmentation:  6
Size (number of batches) of augmented dataset:  6
Train size (number of batches) after augmentation:  12

The model

In [10]:
base_model = tensorflow.keras.applications.VGG16(weights='imagenet', input_shape=input_shape, include_top=False)    # False, do not include the classification layer of the model

base_model.trainable = False

inputs = tensorflow.keras.Input(shape=input_shape)

x = base_model(inputs, training=False)
x = tensorflow.keras.layers.GlobalAveragePooling2D()(x)
outputs = tensorflow.keras.layers.Dense(len(class_names), activation = 'softmax')(x)    # Add own classififcation layer

model = tensorflow.keras.Model(inputs, outputs)

cb = [EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)]
model.summary()
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 256, 256, 3)]     0         
_________________________________________________________________
vgg16 (Functional)           (None, 8, 8, 512)         14714688  
_________________________________________________________________
global_average_pooling2d (Gl (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 2)                 1026      
=================================================================
Total params: 14,715,714
Trainable params: 1,026
Non-trainable params: 14,714,688
_________________________________________________________________
In [11]:
model.compile(loss=SparseCategoricalCrossentropy(), optimizer=Adam(0.01), metrics=['accuracy'])

history1 = model.fit(train_ds, validation_data =  val_ds, epochs=32, callbacks=cb)
Epoch 1/32
12/12 [==============================] - 2s 165ms/step - loss: 0.7615 - accuracy: 0.5319 - val_loss: 0.8130 - val_accuracy: 0.5000
Epoch 2/32
12/12 [==============================] - 1s 92ms/step - loss: 0.7312 - accuracy: 0.4787 - val_loss: 0.6567 - val_accuracy: 0.5000
Epoch 3/32
12/12 [==============================] - 1s 85ms/step - loss: 0.6727 - accuracy: 0.5851 - val_loss: 0.6617 - val_accuracy: 0.5000
Epoch 4/32
12/12 [==============================] - 1s 89ms/step - loss: 0.6363 - accuracy: 0.6383 - val_loss: 0.6460 - val_accuracy: 0.6000
Epoch 5/32
12/12 [==============================] - 1s 87ms/step - loss: 0.6479 - accuracy: 0.5957 - val_loss: 0.6295 - val_accuracy: 0.6000
Epoch 6/32
12/12 [==============================] - 1s 85ms/step - loss: 0.6129 - accuracy: 0.7128 - val_loss: 0.6308 - val_accuracy: 0.7000
Epoch 7/32
12/12 [==============================] - 1s 83ms/step - loss: 0.6295 - accuracy: 0.6489 - val_loss: 0.6301 - val_accuracy: 0.6000
Epoch 8/32
12/12 [==============================] - 1s 84ms/step - loss: 0.6075 - accuracy: 0.6915 - val_loss: 0.6323 - val_accuracy: 0.6000
Epoch 9/32
12/12 [==============================] - 1s 88ms/step - loss: 0.6031 - accuracy: 0.6702 - val_loss: 0.6193 - val_accuracy: 0.6000
Epoch 10/32
12/12 [==============================] - 1s 88ms/step - loss: 0.6588 - accuracy: 0.5957 - val_loss: 0.7191 - val_accuracy: 0.5000
Epoch 11/32
12/12 [==============================] - 1s 84ms/step - loss: 0.6434 - accuracy: 0.5851 - val_loss: 0.7030 - val_accuracy: 0.5000
Epoch 12/32
12/12 [==============================] - 1s 87ms/step - loss: 0.6278 - accuracy: 0.6383 - val_loss: 0.6065 - val_accuracy: 0.7000
Epoch 13/32
12/12 [==============================] - 1s 89ms/step - loss: 0.5584 - accuracy: 0.7660 - val_loss: 0.6047 - val_accuracy: 0.6000
Epoch 14/32
12/12 [==============================] - 1s 86ms/step - loss: 0.6004 - accuracy: 0.6596 - val_loss: 0.6641 - val_accuracy: 0.7000
Epoch 15/32
12/12 [==============================] - 1s 89ms/step - loss: 0.5741 - accuracy: 0.7021 - val_loss: 0.6110 - val_accuracy: 0.6000
Epoch 16/32
12/12 [==============================] - 1s 84ms/step - loss: 0.5771 - accuracy: 0.6809 - val_loss: 0.6591 - val_accuracy: 0.6000
Epoch 17/32
12/12 [==============================] - 1s 84ms/step - loss: 0.6234 - accuracy: 0.6170 - val_loss: 0.6574 - val_accuracy: 0.7000
Epoch 18/32
12/12 [==============================] - 1s 85ms/step - loss: 0.5706 - accuracy: 0.6489 - val_loss: 0.6112 - val_accuracy: 0.5000
In [12]:
model.compile(loss=SparseCategoricalCrossentropy(), optimizer=Adam(0.001), metrics=['accuracy'])

history2 = model.fit(train_ds, validation_data =  val_ds, epochs=32, callbacks=cb)
Epoch 1/32
12/12 [==============================] - 1s 99ms/step - loss: 0.5642 - accuracy: 0.7234 - val_loss: 0.6063 - val_accuracy: 0.7000
Epoch 2/32
12/12 [==============================] - 1s 92ms/step - loss: 0.5555 - accuracy: 0.7872 - val_loss: 0.6050 - val_accuracy: 0.6000
Epoch 3/32
12/12 [==============================] - 1s 85ms/step - loss: 0.5841 - accuracy: 0.7234 - val_loss: 0.6051 - val_accuracy: 0.6000
Epoch 4/32
12/12 [==============================] - 1s 85ms/step - loss: 0.5641 - accuracy: 0.7872 - val_loss: 0.6052 - val_accuracy: 0.6000
Epoch 5/32
12/12 [==============================] - 1s 85ms/step - loss: 0.5327 - accuracy: 0.8085 - val_loss: 0.6087 - val_accuracy: 0.8000
Epoch 6/32
12/12 [==============================] - 1s 86ms/step - loss: 0.5458 - accuracy: 0.7447 - val_loss: 0.6087 - val_accuracy: 0.8000
Epoch 7/32
12/12 [==============================] - 1s 86ms/step - loss: 0.5484 - accuracy: 0.7553 - val_loss: 0.6059 - val_accuracy: 0.7000
In [13]:
model.evaluate(test_ds)
1/1 [==============================] - 0s 1ms/step - loss: 0.7067 - accuracy: 0.7500
Out[13]:
[0.7067238092422485, 0.75]

Plotting the metrics

In [14]:
def plot(history1, history2, variable1, variable2):
    # combining metrics from both trainings    
    var1_history = history1[variable1]
    var1_history.extend(history2[variable1])
    
    var2_history = history1[variable2]
    var2_history.extend(history2[variable2])
    
    # plotting them
    plt.plot(range(len(var1_history)), var1_history)
    plt.plot(range(len(var2_history)), var2_history)
    plt.legend([variable1, variable2])
    plt.title(variable1)
In [15]:
plot(history1.history, history2.history, "accuracy", 'val_accuracy')
In [16]:
plot(history1.history, history2.history, "loss", 'val_loss')

Prediction

In [17]:
# prediction for all samples in the dataset

plt.figure(figsize=(20, 20))

for i in test_ds.as_numpy_iterator():
    img, label = i  
    for x in range(len(label)):  
        ax = plt.subplot(1, len(label), x + 1)
        plt.axis('off')   # remove axes
        plt.imshow(img[x])    # shape from (64, 256, 256, 3) --> (256, 256, 3)
        output = model.predict(np.expand_dims(img[x],0))    # getting output; input shape (256, 256, 3) --> (1, 256, 256, 3)
        pred = np.argmax(output[0])    # finding max
        t = "Prdicted: " + class_names[pred]    # Picking the label from class_names base don the model output
        t = t + "\nTrue: " + class_names[label[x]]
        t = t + "\nProbability: " + str(output[0][pred])
        plt.title(t)

The low accuracy rate is due to the very few number of samples in the training set. Even after augmentation techniques, the dataset is small to obtain high results. This notebook is a proof of concept of how neural networks can be used to differentiate between insect bites.

deepC

In [ ]:
model.save('insect.h5')

!deepCC insect.h5
[INFO]
Reading [keras model] 'insect.h5'