Breast Cancer Detection Using Deep Learning¶
Credit: AITS Cainvas Community
Photo by Shreya Damle on Dribbble
In this notebook, we will be building a CNN model inorder to detect Breast Cancer using the Breast Cancer Wisconsin (Diagnostic) Data Set¶
!wget https://cainvas-static.s3.amazonaws.com/media/user_data/jayc/data.csv
Importing necessary libraries that will use in model building.¶
Let's import the important libraries which will be used in this project.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
Loading of data and looking into some insights¶
In this section we will be loading our Breast Cancer Wisconsin Dataset and then we will be extracting some information out of it.
cancer = datasets.load_breast_cancer()
We will be using pandas DataFrame to present all our data.
df = pd.DataFrame(data = cancer.data, columns=cancer.feature_names)
df.head()
Let's find the correlation between some columns¶
We we have used a heatmap inorder to visualize the correlation between the first 10 columns.
import seaborn as sns
featureMeans = list(df.columns[1:11])
plt.figure(figsize=(10,10))
sns.heatmap(df[featureMeans].corr(), annot=True, square=True, cmap='coolwarm')
plt.show()
Description of data¶
The describe() keyword can be used to extract the description of various fields in the dataset.
df.describe()
Data Splitting and Standardization¶
The data needs to be split in the training and testing sets. Furthermore, we need to standardize the inputs as well before fitting into the model
x=df
x.shape
y=cancer.target
y.shape
cancer.target_names
We will be using 80% of our dataset for training purposes and 20% for testing.
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0, stratify = y)
x_train.shape
x_test.shape
StandardScaler removes the mean and scales the data to unit variance.
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
x_train = x_train.reshape(455,30,1)
x_test = x_test.reshape(114, 30, 1)
Building the CNN Model¶
epochs = 50
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=2, activation='relu', input_shape = (30,1)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Conv1D(filters=64, kernel_size=2, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
Model Summary¶
model.summary()
Compile defines the loss function, the optimizer, and the metrics.
model.compile(optimizer=Adam(lr=0.00005), loss = 'binary_crossentropy', metrics=['accuracy'])
Now, let's fit the model
history = model.fit(x_train, y_train, epochs=epochs, validation_data=(x_test, y_test), verbose=1)
def plot_learningCurve(history, epoch):
# Plot training & validation accuracy values
epoch_range = range(1, epoch+1)
plt.plot(epoch_range, history.history['accuracy'])
plt.plot(epoch_range, history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(epoch_range, history.history['loss'])
plt.plot(epoch_range, history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
A history object that contains all information collected during training.
history.history
Plotting the curves using the function defined above¶
plot_learningCurve(history, epochs)
- In Model accuracy graph validation accuracy is always greater than train accuracy thats means our model is not overfitting.
- In Model accuracy graph validation loss is also very lower than training loss so unless and until validation loss goes above than the training loss than we can keep training our model.
Making predictions on some values¶
test_predictions = model.predict_classes(x_test)
print(test_predictions[:10])
#saving the model
model.save('Breast_Cancer_Detection.h5')
deepCC¶
!deepCC 'Breast_Cancer_Detection.h5'