Detecting Covid19 using lung CT scans¶

Using the Lung CT scans to predict whether a person has COVID 19.

Deep learning models have proven useful and very efficient in the medical field to process scans, x-rays and other medical information to output useful information.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
import os
import random

Dataset¶

!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/COVID_CT_SCAN.zip
!unzip -qo COVID_CT_SCAN.zip
!rm COVID_CT_SCAN.zip

--2021-09-08 07:42:08--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/COVID_CT_SCAN.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.47
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97368158 (93M) [application/zip]
Saving to: ‘COVID_CT_SCAN.zip’

COVID_CT_SCAN.zip   100%[===================>]  92.86M  77.1MB/s    in 1.2s    

2021-09-08 07:42:10 (77.1 MB/s) - ‘COVID_CT_SCAN.zip’ saved [97368158/97368158]

The dataset has the following:

* CT_COVID - This folder has images corresponding to a positive case of the COVID.

* CT_NonCOVID - This folder has images corresponding to a negative case of the COVID.

* A xlsx file - Contains the meta data of the images.

data_dir = 'COVID_CT_SCAN'

print("Number of samples")
for f in os.listdir(data_dir + '/'):
    if os.path.isdir(data_dir + '/' + f):
        print(f, " : ", len(os.listdir(data_dir + '/' + f +'/')))

Number of samples
CT_COVID  :  349
CT_NonCOVID  :  397

Its an almost balanced dataset.

batch_size = 64

print("Training set")
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=113, 
  batch_size=batch_size)

print("Validation set")
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=113, 
  batch_size=batch_size)

Training set
Found 746 files belonging to 2 classes.
Using 597 files for training.
Validation set
Found 746 files belonging to 2 classes.
Using 149 files for validation.

Looking into the classes

class_names = train_ds.class_names
print(class_names)

['CT_COVID', 'CT_NonCOVID']

Visualization¶

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")

print("Shape of one training batch")

for image_batch, labels_batch in train_ds:
    print("Input: ", image_batch.shape)
    print("Labels: ", labels_batch.shape)
    break

Shape of one training batch
Input:  (64, 256, 256, 3)
Labels:  (64,)

AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

# Normalizing the pixel values

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))

Model¶

model = tf.keras.models.Sequential([
  layers.Conv2D(16, 3, padding='same', activation='relu', input_shape = (256, 256, 3)),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),  
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(64, activation = 'relu'),  
  layers.Dense(1, activation = 'sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 256, 256, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 16)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 128, 128, 32)      4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 64, 64, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 65536)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               8388736   
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 8,420,641
Trainable params: 8,420,641
Non-trainable params: 0
_________________________________________________________________

epochs=16
history = model.fit(train_ds, validation_data=val_ds, epochs=epochs)

Epoch 1/16
10/10 [==============================] - 1s 147ms/step - loss: 0.9583 - accuracy: 0.5327 - val_loss: 0.6361 - val_accuracy: 0.6309
Epoch 2/16
10/10 [==============================] - 1s 68ms/step - loss: 0.6413 - accuracy: 0.6214 - val_loss: 0.5692 - val_accuracy: 0.7584
Epoch 3/16
10/10 [==============================] - 1s 68ms/step - loss: 0.5932 - accuracy: 0.6851 - val_loss: 0.5408 - val_accuracy: 0.7584
Epoch 4/16
10/10 [==============================] - 1s 69ms/step - loss: 0.5312 - accuracy: 0.7454 - val_loss: 0.4852 - val_accuracy: 0.7114
Epoch 5/16
10/10 [==============================] - 1s 68ms/step - loss: 0.4459 - accuracy: 0.7722 - val_loss: 0.5387 - val_accuracy: 0.7248
Epoch 6/16
10/10 [==============================] - 1s 68ms/step - loss: 0.3855 - accuracy: 0.8141 - val_loss: 0.4894 - val_accuracy: 0.7248
Epoch 7/16
10/10 [==============================] - 1s 66ms/step - loss: 0.2800 - accuracy: 0.8928 - val_loss: 0.5306 - val_accuracy: 0.7315
Epoch 8/16
10/10 [==============================] - 1s 68ms/step - loss: 0.2085 - accuracy: 0.9196 - val_loss: 0.5611 - val_accuracy: 0.7651
Epoch 9/16
10/10 [==============================] - 1s 69ms/step - loss: 0.1852 - accuracy: 0.9179 - val_loss: 0.7094 - val_accuracy: 0.7517
Epoch 10/16
10/10 [==============================] - 1s 68ms/step - loss: 0.0997 - accuracy: 0.9715 - val_loss: 0.7070 - val_accuracy: 0.7651
Epoch 11/16
10/10 [==============================] - 1s 71ms/step - loss: 0.0707 - accuracy: 0.9749 - val_loss: 0.7207 - val_accuracy: 0.7517
Epoch 12/16
10/10 [==============================] - 1s 68ms/step - loss: 0.0512 - accuracy: 0.9849 - val_loss: 1.0256 - val_accuracy: 0.7114
Epoch 13/16
10/10 [==============================] - 1s 70ms/step - loss: 0.0543 - accuracy: 0.9899 - val_loss: 0.7819 - val_accuracy: 0.7517
Epoch 14/16
10/10 [==============================] - 1s 70ms/step - loss: 0.0270 - accuracy: 0.9933 - val_loss: 1.0060 - val_accuracy: 0.7584
Epoch 15/16
10/10 [==============================] - 1s 69ms/step - loss: 0.0323 - accuracy: 0.9916 - val_loss: 0.9800 - val_accuracy: 0.7718
Epoch 16/16
10/10 [==============================] - 1s 67ms/step - loss: 0.0156 - accuracy: 0.9983 - val_loss: 0.9312 - val_accuracy: 0.7584

model.evaluate(val_ds)

3/3 [==============================] - 0s 11ms/step - loss: 0.9312 - accuracy: 0.7584

[0.9312022924423218, 0.7583892345428467]

There is a difference in accuracy between the train and validation accuracy. This high variance can be reducesd by training with a larger dataset, thus resulting in higher accuracy.

Plotting the metrics¶

def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.title(variable)

plot(history.history, "accuracy", 'val_accuracy')

plot(history.history, "loss", "val_loss")

Prediction¶

# pick random test data sample from one batch
x = random.randint(0, batch_size - 1)

for i in val_ds.as_numpy_iterator():
    img, label = i    
    plt.axis('off')   # remove axes
    plt.imshow(img[x])    # shape from (64, 256, 256, 3) --> (256, 256, 3)
    output = model.predict(np.expand_dims(img[x],0))    # getting output; input shape (256, 256, 3) --> (1, 256, 256, 3)
    pred = int(output[0][0]>0.5)   
    print("Prdicted: ", class_names[pred])    # Picking the label from class_names base don the model output
    print("True: ", class_names[label[x]], "( ", output[0][0], " --> ", pred, " )")
    break

Prdicted:  CT_COVID
True:  CT_NonCOVID (  0.008491203  -->  0  )

deepC¶

model.save('lungct.h5')

#!deepCC lungct.h5

Model Files
covid-chest-scan.h5 keras Model
deepSea Compiled Models
covid-chest-scan.exe deepSea Ubuntu