Cainvas

Malaria Parasite Detection

Credit: AITS Cainvas Community

Photo by Kurzgesagt — In a Nutshell on YouTube

Malaria is a life-threatening disease caused by parasites that are transmitted to people through the bites of infected female Anopheles mosquitoes. This notebook uses Convolutional Neural Networks to predict if a thin blood smear is parasitic or uninfected in nature.

This notebook uses highly processed images from the Malaria Dataset from the National Library of Medicine.

Each colored image is converted to 50X50 grayscale image to reduce the size of the dataset from ~350MB to 40MB.

Importing the dataset

In [1]:
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/malaria-dataset-processed.zip"
!unzip -qo malaria-dataset-processed.zip
!rm malaria-dataset-processed.zip
--2021-07-01 12:23:21--  https://cainvas-static.s3.amazonaws.com/media/user_data/um4ng-tiw0/malaria-dataset-processed.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.0
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 43257796 (41M) [application/x-zip-compressed]
Saving to: ‘malaria-dataset-processed.zip’

malaria-dataset-pro 100%[===================>]  41.25M  94.3MB/s    in 0.4s    

2021-07-01 12:23:22 (94.3 MB/s) - ‘malaria-dataset-processed.zip’ saved [43257796/43257796]

In [2]:
#Importing necessary libraries
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPool2D
from tensorflow.keras.layers import ZeroPadding2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import matplotlib.pyplot as plt
import cv2
In [3]:
img = cv2.imread("cell_images/train/Parasitized/C33P1thinF_IMG_20150619_114756a_cell_181.png")
plt.title("Infected cell")
plt.imshow(img)
Out[3]:
<matplotlib.image.AxesImage at 0x7fa1285975c0>
In [4]:
img = cv2.imread("cell_images/train/Uninfected/C1_thinF_IMG_20150604_104722_cell_79.png")
plt.title("Uninfected cell")
plt.imshow(img)
Out[4]:
<matplotlib.image.AxesImage at 0x7fa1264efd30>
In [5]:
image_size = (50, 50)

Preparing the data

We have used ImageDataGenerator from keras, in the subsequent cells, to fetch the images along with their labels to train the neural network

In [6]:
datagen = ImageDataGenerator(rescale = 1/255.0, validation_split = 0.25)
In [7]:
train_data_generator = datagen.flow_from_directory(directory="cell_images/train", target_size = image_size, class_mode="binary", batch_size = 16, subset = "training")
Found 16536 images belonging to 2 classes.
In [8]:
validation_data_generator = datagen.flow_from_directory(directory="cell_images/train", target_size = image_size, class_mode="binary", batch_size = 16, subset = "validation")
Found 5510 images belonging to 2 classes.

The 0 label means the cell is Parasitic and 1 means Uninfected

In [9]:
train_data_generator.labels
Out[9]:
array([0, 0, 0, ..., 1, 1, 1], dtype=int32)

The Model

In [10]:
model = Sequential()

model.add(Conv2D(16, (3,3), input_shape=(*image_size, 3), activation="relu"))
model.add(MaxPool2D(2,2))
# model.add(Dropout(0.2))

model.add(Conv2D(32, (3,3), activation="relu"))
model.add(MaxPool2D(2,2))
# model.add(Dropout(0.3))

model.add(Conv2D(16, (3,3), activation="relu"))
model.add(MaxPool2D(2,2))

model.add(Flatten())
model.add(Dense(32, activation="relu"))
# model.add(Dense(64, activation="relu"))
# model.add(Dropout(0.5))

model.add(Dense(1, activation="sigmoid"))
In [11]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 48, 48, 16)        448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 22, 22, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 11, 11, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 9, 9, 16)          4624      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 32)                8224      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
=================================================================
Total params: 17,969
Trainable params: 17,969
Non-trainable params: 0
_________________________________________________________________
In [12]:
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
In [13]:
cb = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)]

history=model.fit(train_data_generator, 
                  steps_per_epoch=len(train_data_generator), 
                  epochs=50, 
                  validation_data=validation_data_generator, 
                  validation_steps = len(validation_data_generator), 
                  callbacks=cb)
Epoch 1/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.6757 - accuracy: 0.5605 - val_loss: 0.6007 - val_accuracy: 0.6775
Epoch 2/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.4013 - accuracy: 0.8113 - val_loss: 0.2723 - val_accuracy: 0.8897
Epoch 3/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.2392 - accuracy: 0.9053 - val_loss: 0.2029 - val_accuracy: 0.9200
Epoch 4/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1904 - accuracy: 0.9274 - val_loss: 0.1812 - val_accuracy: 0.9298
Epoch 5/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1689 - accuracy: 0.9380 - val_loss: 0.1863 - val_accuracy: 0.9307
Epoch 6/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1584 - accuracy: 0.9416 - val_loss: 0.1609 - val_accuracy: 0.9407
Epoch 7/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1479 - accuracy: 0.9454 - val_loss: 0.1744 - val_accuracy: 0.9361
Epoch 8/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1410 - accuracy: 0.9477 - val_loss: 0.1597 - val_accuracy: 0.9416
Epoch 9/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1367 - accuracy: 0.9501 - val_loss: 0.1644 - val_accuracy: 0.9377
Epoch 10/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1279 - accuracy: 0.9542 - val_loss: 0.1617 - val_accuracy: 0.9407
Epoch 11/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1222 - accuracy: 0.9561 - val_loss: 0.1699 - val_accuracy: 0.9390
Epoch 12/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1177 - accuracy: 0.9576 - val_loss: 0.1806 - val_accuracy: 0.9348
Epoch 13/50
1034/1034 [==============================] - 5s 5ms/step - loss: 0.1134 - accuracy: 0.9584 - val_loss: 0.1681 - val_accuracy: 0.9381
In [14]:
datagen_test = ImageDataGenerator(rescale = 1/255.0)
test_data_generator = datagen.flow_from_directory(directory="cell_images/valid", target_size = image_size, class_mode="binary", batch_size = 16, subset = "training")
Found 4134 images belonging to 2 classes.
In [15]:
test_data_generator.labels
Out[15]:
array([0, 0, 0, ..., 1, 1, 1], dtype=int32)
In [16]:
model.evaluate(test_data_generator)
259/259 [==============================] - 1s 4ms/step - loss: 0.1590 - accuracy: 0.9415
Out[16]:
[0.15897414088249207, 0.9414610266685486]
In [17]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
In [18]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
In [19]:
x,y = test_data_generator.next()
pred_array=[]
for i in range(10):
    img = x[i]
    img = img.reshape(-1,50,50,3)
    pred_val = model.predict(img)
    if(pred_val > 0.5):
        pred_val = 1
    else:
        pred_val = 0
    pred_array.append(pred_val)
    
print("Predicted Values:", pred_array)
print("Actual Values:", y[:10])
Predicted Values: [0, 1, 1, 1, 1, 1, 0, 0, 1, 1]
Actual Values: [0. 1. 1. 1. 1. 1. 0. 0. 1. 1.]

Visualizing the predictions of the trained model on unseen data

In [20]:
plt.figure(figsize = (10,10))
for i in range(10):
    plt.subplot(5,5,i+1)
    plt.imshow(x[i])
    plt.title('Original: {}, Predicted: {}'.format(y[i], pred_array[i]))
    plt.axis('Off')

plt.subplots_adjust(left=1.5, right=3, top=1.2)
plt.show()
In [21]:
model.save("malaria_parasite_detection.h5")

deepCC

In [22]:
!deepCC "malaria_parasite_detection.h5"
[INFO]
Reading [keras model] 'malaria_parasite_detection.h5'
[SUCCESS]
Saved 'malaria_parasite_detection_deepC/malaria_parasite_detection.onnx'
[INFO]
Reading [onnx model] 'malaria_parasite_detection_deepC/malaria_parasite_detection.onnx'
[INFO]
Model info:
  ir_vesion : 5
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) conv2d_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'malaria_parasite_detection_deepC/malaria_parasite_detection.cpp'
[INFO]
deepSea model files are ready in 'malaria_parasite_detection_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "malaria_parasite_detection_deepC/malaria_parasite_detection.cpp" -D_AITS_MAIN -o "malaria_parasite_detection_deepC/malaria_parasite_detection.exe"
[RUNNING COMMAND]
size "malaria_parasite_detection_deepC/malaria_parasite_detection.exe"
   text	   data	    bss	    dec	    hex	filename
 244797	   3760	    760	 249317	  3cde5	malaria_parasite_detection_deepC/malaria_parasite_detection.exe
[SUCCESS]
Saved model as executable "malaria_parasite_detection_deepC/malaria_parasite_detection.exe"