Cainvas
Model Files
covid-chest-scan.h5
keras
Model
deepSea Compiled Models
covid-chest-scan.exe
deepSea
Ubuntu

Detecting Covid19 using lung CT scans

Credit: AITS Cainvas Community

Photo by Cloudy gif

Using the Lung CT scans to predict whether a person has COVID 19.

Deep learning models have proven useful and very efficient in the medical field to process scans, x-rays and other medical information to output useful information.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
import os
import random

Dataset

In [2]:
!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/COVID_CT_SCAN.zip
!unzip -qo COVID_CT_SCAN.zip
!rm COVID_CT_SCAN.zip
--2021-09-08 07:42:08--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/COVID_CT_SCAN.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.47
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 97368158 (93M) [application/zip]
Saving to: ‘COVID_CT_SCAN.zip’

COVID_CT_SCAN.zip   100%[===================>]  92.86M  77.1MB/s    in 1.2s    

2021-09-08 07:42:10 (77.1 MB/s) - ‘COVID_CT_SCAN.zip’ saved [97368158/97368158]

The dataset has the following:

* CT_COVID - This folder has images corresponding to a positive case of the COVID.

* CT_NonCOVID - This folder has images corresponding to a negative case of the COVID.

* A xlsx file - Contains the meta data of the images.
In [3]:
data_dir = 'COVID_CT_SCAN'

print("Number of samples")
for f in os.listdir(data_dir + '/'):
    if os.path.isdir(data_dir + '/' + f):
        print(f, " : ", len(os.listdir(data_dir + '/' + f +'/')))
Number of samples
CT_COVID  :  349
CT_NonCOVID  :  397

Its an almost balanced dataset.

In [4]:
batch_size = 64

print("Training set")
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=113, 
  batch_size=batch_size)

print("Validation set")
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=113, 
  batch_size=batch_size)
Training set
Found 746 files belonging to 2 classes.
Using 597 files for training.
Validation set
Found 746 files belonging to 2 classes.
Using 149 files for validation.

Looking into the classes

In [5]:
class_names = train_ds.class_names
print(class_names)
['CT_COVID', 'CT_NonCOVID']

Visualization

In [6]:
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")
In [7]:
print("Shape of one training batch")

for image_batch, labels_batch in train_ds:
    print("Input: ", image_batch.shape)
    print("Labels: ", labels_batch.shape)
    break
Shape of one training batch
Input:  (64, 256, 256, 3)
Labels:  (64,)
In [8]:
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
In [9]:
# Normalizing the pixel values

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))

Model

In [10]:
model = tf.keras.models.Sequential([
  layers.Conv2D(16, 3, padding='same', activation='relu', input_shape = (256, 256, 3)),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),  
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(64, activation = 'relu'),  
  layers.Dense(1, activation = 'sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 256, 256, 16)      448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 16)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 128, 128, 32)      4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 64, 64, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 65536)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               8388736   
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 8,420,641
Trainable params: 8,420,641
Non-trainable params: 0
_________________________________________________________________
In [11]:
epochs=16
history = model.fit(train_ds, validation_data=val_ds, epochs=epochs)
Epoch 1/16
10/10 [==============================] - 1s 147ms/step - loss: 0.9583 - accuracy: 0.5327 - val_loss: 0.6361 - val_accuracy: 0.6309
Epoch 2/16
10/10 [==============================] - 1s 68ms/step - loss: 0.6413 - accuracy: 0.6214 - val_loss: 0.5692 - val_accuracy: 0.7584
Epoch 3/16
10/10 [==============================] - 1s 68ms/step - loss: 0.5932 - accuracy: 0.6851 - val_loss: 0.5408 - val_accuracy: 0.7584
Epoch 4/16
10/10 [==============================] - 1s 69ms/step - loss: 0.5312 - accuracy: 0.7454 - val_loss: 0.4852 - val_accuracy: 0.7114
Epoch 5/16
10/10 [==============================] - 1s 68ms/step - loss: 0.4459 - accuracy: 0.7722 - val_loss: 0.5387 - val_accuracy: 0.7248
Epoch 6/16
10/10 [==============================] - 1s 68ms/step - loss: 0.3855 - accuracy: 0.8141 - val_loss: 0.4894 - val_accuracy: 0.7248
Epoch 7/16
10/10 [==============================] - 1s 66ms/step - loss: 0.2800 - accuracy: 0.8928 - val_loss: 0.5306 - val_accuracy: 0.7315
Epoch 8/16
10/10 [==============================] - 1s 68ms/step - loss: 0.2085 - accuracy: 0.9196 - val_loss: 0.5611 - val_accuracy: 0.7651
Epoch 9/16
10/10 [==============================] - 1s 69ms/step - loss: 0.1852 - accuracy: 0.9179 - val_loss: 0.7094 - val_accuracy: 0.7517
Epoch 10/16
10/10 [==============================] - 1s 68ms/step - loss: 0.0997 - accuracy: 0.9715 - val_loss: 0.7070 - val_accuracy: 0.7651
Epoch 11/16
10/10 [==============================] - 1s 71ms/step - loss: 0.0707 - accuracy: 0.9749 - val_loss: 0.7207 - val_accuracy: 0.7517
Epoch 12/16
10/10 [==============================] - 1s 68ms/step - loss: 0.0512 - accuracy: 0.9849 - val_loss: 1.0256 - val_accuracy: 0.7114
Epoch 13/16
10/10 [==============================] - 1s 70ms/step - loss: 0.0543 - accuracy: 0.9899 - val_loss: 0.7819 - val_accuracy: 0.7517
Epoch 14/16
10/10 [==============================] - 1s 70ms/step - loss: 0.0270 - accuracy: 0.9933 - val_loss: 1.0060 - val_accuracy: 0.7584
Epoch 15/16
10/10 [==============================] - 1s 69ms/step - loss: 0.0323 - accuracy: 0.9916 - val_loss: 0.9800 - val_accuracy: 0.7718
Epoch 16/16
10/10 [==============================] - 1s 67ms/step - loss: 0.0156 - accuracy: 0.9983 - val_loss: 0.9312 - val_accuracy: 0.7584
In [12]:
model.evaluate(val_ds)
3/3 [==============================] - 0s 11ms/step - loss: 0.9312 - accuracy: 0.7584
Out[12]:
[0.9312022924423218, 0.7583892345428467]

There is a difference in accuracy between the train and validation accuracy. This high variance can be reducesd by training with a larger dataset, thus resulting in higher accuracy.

Plotting the metrics

In [13]:
def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.title(variable)
In [14]:
plot(history.history, "accuracy", 'val_accuracy')
In [15]:
plot(history.history, "loss", "val_loss")

Prediction

In [16]:
# pick random test data sample from one batch
x = random.randint(0, batch_size - 1)

for i in val_ds.as_numpy_iterator():
    img, label = i    
    plt.axis('off')   # remove axes
    plt.imshow(img[x])    # shape from (64, 256, 256, 3) --> (256, 256, 3)
    output = model.predict(np.expand_dims(img[x],0))    # getting output; input shape (256, 256, 3) --> (1, 256, 256, 3)
    pred = int(output[0][0]>0.5)   
    print("Prdicted: ", class_names[pred])    # Picking the label from class_names base don the model output
    print("True: ", class_names[label[x]], "( ", output[0][0], " --> ", pred, " )")
    break
Prdicted:  CT_COVID
True:  CT_NonCOVID (  0.008491203  -->  0  )

deepC

In [20]:
model.save('lungct.h5')

#!deepCC lungct.h5