Lung Cancer Detection from CT scans

Photo by Vladimir Marchukov on Dribbble

In this notebook, we will predict whether a given image falls under benign, malignant or normal case from CT scans in The IQ-OTHNCCD lung cancer dataset. We will employ a Convolutional Neural Network to classify the images into one of the three classes. The dataset contains a total of 1190 images representing CT scan slices of 110 cases. These cases are grouped into three classes: normal, benign, and malignant.Out of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases.

Importing the dataset¶

!wget "https://cainvas-static.s3.amazonaws.com/media/user_data/um4ng-tiw0/Lung_cancer_dataset.zip"
!unzip -qo Lung_cancer_dataset.zip
!rm Lung_cancer_dataset.zip

--2021-07-15 19:38:03--  https://cainvas-static.s3.amazonaws.com/media/user_data/um4ng-tiw0/Lung_cancer_dataset.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.62.100
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.62.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 156629884 (149M) [application/x-zip-compressed]
Saving to: ‘Lung_cancer_dataset.zip’

Lung_cancer_dataset 100%[===================>] 149.37M  48.9MB/s    in 3.1s    

2021-07-15 19:38:06 (48.9 MB/s) - ‘Lung_cancer_dataset.zip’ saved [156629884/156629884]

#Importing necessary libraries
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPool2D, BatchNormalization
from tensorflow.keras.layers import ZeroPadding2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import matplotlib.pyplot as plt
import cv2

Visualizing the dataset¶

img = cv2.imread("Lung_cancer_dataset/Train/Malignant cases/Malignant case (10).jpg")
plt.title("Malignant Case")
plt.imshow(img)

<matplotlib.image.AxesImage at 0x7f8b91f5cd68>

img_width = 64
img_height = 64

Preparing the data

We have used ImageDataGenerator from keras, in the subsequent cells, to fetch the images along with their labels to train the neural network

datagen = ImageDataGenerator(rescale = 1/255.0, validation_split = 0.1)

train_data_generator = datagen.flow_from_directory(directory="Lung_cancer_dataset/Train", target_size = (img_width, img_height), color_mode="grayscale", class_mode="categorical", batch_size = 16, subset = "training", shuffle="True")

Found 970 images belonging to 3 classes.

validation_data_generator = datagen.flow_from_directory(directory="Lung_cancer_dataset/Train", target_size = (img_width, img_height),color_mode="grayscale", class_mode="categorical", batch_size = 16,subset = "validation")

Found 107 images belonging to 3 classes.

The labels¶

We will use one hot encoding here as our data is categorical in nature

train_data_generator.next()[1]

array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.]], dtype=float32)

The Model

model = Sequential()

model.add(Conv2D(32, (3,3), input_shape=(img_width, img_height,1), activation="relu"))
model.add(MaxPool2D(2,2))

model.add(Conv2D(64,(3,3), activation="relu"))
model.add(MaxPool2D(3,3))

model.add(Conv2D(32,(3,3), padding ="same", activation="relu"))
model.add(MaxPool2D(2,2))

model.add(Flatten())

model.add(Dense(32,activation="relu"))
#model.add(Dropout(0.2))
model.add(Dense(64, activation="relu"))
#model.add(Dropout(0.3))
model.add(Dense(32,activation="relu"))
#model.add(Dropout(0.4))

model.add(Dense(3, activation="softmax"))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 62, 62, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 31, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 29, 29, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 9, 9, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 9, 9, 32)          18464     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 32)          0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 32)                16416     
_________________________________________________________________
dense_1 (Dense)              (None, 64)                2112      
_________________________________________________________________
dense_2 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 99        
=================================================================
Total params: 57,987
Trainable params: 57,987
Non-trainable params: 0
_________________________________________________________________

model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['accuracy'])

my_callback = [tf.keras.callbacks.EarlyStopping(monitor = 'val_loss', patience = 10, restore_best_weights = True)]

history=model.fit(train_data_generator, steps_per_epoch=len(train_data_generator), epochs=50, validation_data=validation_data_generator, validation_steps = len(validation_data_generator), callbacks=my_callback)

Epoch 1/50
61/61 [==============================] - 5s 86ms/step - loss: 0.8094 - accuracy: 0.6567 - val_loss: 0.9506 - val_accuracy: 0.5514
Epoch 2/50
61/61 [==============================] - 5s 85ms/step - loss: 0.7126 - accuracy: 0.7175 - val_loss: 0.9790 - val_accuracy: 0.5794
Epoch 3/50
61/61 [==============================] - 5s 84ms/step - loss: 0.5950 - accuracy: 0.7794 - val_loss: 1.0906 - val_accuracy: 0.5701
Epoch 4/50
61/61 [==============================] - 5s 84ms/step - loss: 0.4607 - accuracy: 0.8330 - val_loss: 1.1657 - val_accuracy: 0.5701
Epoch 5/50
61/61 [==============================] - 5s 84ms/step - loss: 0.3534 - accuracy: 0.8639 - val_loss: 1.0799 - val_accuracy: 0.5607
Epoch 6/50
61/61 [==============================] - 5s 84ms/step - loss: 0.2662 - accuracy: 0.9031 - val_loss: 0.9420 - val_accuracy: 0.6449
Epoch 7/50
61/61 [==============================] - 5s 84ms/step - loss: 0.2679 - accuracy: 0.8959 - val_loss: 1.0566 - val_accuracy: 0.6168
Epoch 8/50
61/61 [==============================] - 5s 85ms/step - loss: 0.2127 - accuracy: 0.9165 - val_loss: 0.9344 - val_accuracy: 0.6355
Epoch 9/50
61/61 [==============================] - 5s 84ms/step - loss: 0.1432 - accuracy: 0.9454 - val_loss: 0.9446 - val_accuracy: 0.6542
Epoch 10/50
61/61 [==============================] - 5s 84ms/step - loss: 0.1322 - accuracy: 0.9546 - val_loss: 0.7939 - val_accuracy: 0.7009
Epoch 11/50
61/61 [==============================] - 5s 85ms/step - loss: 0.0943 - accuracy: 0.9701 - val_loss: 0.7528 - val_accuracy: 0.7570
Epoch 12/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0794 - accuracy: 0.9701 - val_loss: 0.7595 - val_accuracy: 0.7570
Epoch 13/50
61/61 [==============================] - 5s 85ms/step - loss: 0.0640 - accuracy: 0.9722 - val_loss: 1.0085 - val_accuracy: 0.6822
Epoch 14/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0499 - accuracy: 0.9825 - val_loss: 0.8612 - val_accuracy: 0.7664
Epoch 15/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0339 - accuracy: 0.9897 - val_loss: 1.0165 - val_accuracy: 0.6822
Epoch 16/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0200 - accuracy: 0.9938 - val_loss: 1.0724 - val_accuracy: 0.7383
Epoch 17/50
61/61 [==============================] - 5s 85ms/step - loss: 0.0194 - accuracy: 0.9948 - val_loss: 1.1354 - val_accuracy: 0.7664
Epoch 18/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0196 - accuracy: 0.9969 - val_loss: 1.0308 - val_accuracy: 0.7850
Epoch 19/50
61/61 [==============================] - 5s 85ms/step - loss: 0.0173 - accuracy: 0.9948 - val_loss: 1.2284 - val_accuracy: 0.7570
Epoch 20/50
61/61 [==============================] - 5s 84ms/step - loss: 0.0198 - accuracy: 0.9959 - val_loss: 1.2545 - val_accuracy: 0.7290
Epoch 21/50
61/61 [==============================] - 5s 85ms/step - loss: 0.0198 - accuracy: 0.9948 - val_loss: 1.0352 - val_accuracy: 0.7383

Preparing Test Data for prediction¶

datagen_test = ImageDataGenerator(rescale = 1/255.0)
test_data_generator = datagen.flow_from_directory(directory="Lung_cancer_dataset/Test", target_size = (img_width, img_height), color_mode="grayscale", class_mode="categorical", batch_size = 10, subset = "training")

Found 20 images belonging to 3 classes.

test_data_generator.next()[1]

array([[0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.]], dtype=float32)

Model accuracy and loss trends¶

Lets visualize the accuracy and loss trends throughout the training process

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Visualizing the predictions of the model on unseen data¶

x,y = test_data_generator.next()
pred_array=[]
max_index_arr = []
for i in range(5):
    img = x[i]
    img = img.reshape(-1,64,64,1)
    pred_val = model.predict(img)
    max_idx = np.argmax(pred_val)
    pred_array.append(max_idx)

#Making the Output meaningful using named classes

cell_dict = {0:"Benign", 1:"Malignant", 2:"Normal"}
predictions = {}
actual_val = {}

k=0
for arr in y[:5]:
    actual_val[k] = cell_dict[np.argmax(arr)]
    k+=1

k=0
for pred in pred_array:
    predictions[k] = cell_dict[pred]
    k+=1
    
print("ACTUAL:", actual_val)
print("PREDICTIONS:", predictions)

ACTUAL: {0: 'Benign', 1: 'Normal', 2: 'Malignant', 3: 'Malignant', 4: 'Benign'}
PREDICTIONS: {0: 'Normal', 1: 'Normal', 2: 'Malignant', 3: 'Malignant', 4: 'Normal'}

plt.figure(figsize = (20,20))
for i in range(5):
    plt.subplot(5,5,i+1)
    plt.imshow(x[i], cmap="binary")
    plt.title('Original: {}, Predicted: {}'.format(actual_val[i], predictions[i]))
    plt.axis('Off')

plt.subplots_adjust(left=1.5, right=2.5, top=1)
plt.show()

model.save("lung_cancer_prediction.h5")

DeepCC¶

!deepCC