Fetal health detection¶
Credit: AITS Cainvas Community
Photo by Vivi Garleone on Dribbble
Classifying fetal health in order to prevent child and maternal mortality.
United Nation's Sustainable Development Goals reflect that reduction of child mortality is an indicator of human progress. This concept also includes maternal mortality.
Most of the accounted losses have occured in regions of low-resource and coul dhave been prevented.
Cardiotocography (CTG) is the means of measuring the fetal heart rate, movements and uterine contractions, thus continuously monitoring the health of the mother and child. The equipment used to perform the monitoring is called cardiotocograph and work using ultrasound pulses. This is a simple and cost effictive solution to assessing the fetal health, thus allowing professionals to take neccessary action.
import pandas as pd
import numpy as np
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import random
Dataset¶
Citation Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms9:5%3C311::AID-MFM12%3E3.0.CO;2-9). J Matern Fetal Med 5:311-318
The dataset has 2126 samples containing features extracted from cardiotocogram exams.
The data was labelled by expert obstetritians into 3 classes.
df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/fetal_health.csv')
df
# Here are the class labels according to the metadata
class_names = ['Normal', 'Suspect', 'Pathological']
# Lets see the spread of values across classes
df['fetal_health'].value_counts()
This is a heavily unbalanced.
In order to balance the dataset, there are two options,
- upsampling - resample the values to make their count equal to the class label with the higher count (here, 1655).
- downsampling - pick n samples from each class label where n = number of samples in class with least count (here, 176)
Here, we will be upsampling.
# separating into 3 dataframes, one for each class
df1 = df[df['fetal_health'] == 1.0]
df2 = df[df['fetal_health'] == 2.0]
df3 = df[df['fetal_health'] == 3.0]
print("Number of samples in:")
print("Class label 1 - ", len(df1))
print("Class label 2 - ", len(df2))
print("Class label 3 - ", len(df3))
# Upsampling
df2 = df2.sample(len(df1), replace = True) # replace = True enables resampling
df3 = df3.sample(len(df1), replace = True)
print('\nAfter resampling - ')
print("Number of samples in:")
print("Class label 1 - ", len(df1))
print("Class label 2 - ", len(df2))
print("Class label 3 - ", len(df3))
# concatente to form a single dataframe
dfx = df1.append(df2).append(df3)
print('Total number of samples - ', len(dfx))
Preprocessing¶
One hot encoding¶
# Defining input and output columns
inputc = dfx.columns[:-1]
outputc = [1, 2, 3] # to be used after one hot encoding
print("Input columns - ", list(inputc))
print("\nOutput columns - ", outputc)
SInce this is a classification problem, the output of the model which is now as an integer should be one-hot encoded.
y = pd.get_dummies(dfx.fetal_health)
y
# adding as columns to the dataframe
for x in outputc:
dfx[x] = y[x]
dfx
# as said before, the output columns are labelled 1, 2, 3
Train test split¶
# Splitting into train and test using 80-20 split
traindf, testdf = train_test_split(dfx.sample(frac=1), test_size = 0.2) # shuffling the dataframe before splitting
print('Number of samples in:')
print('Train set - ' , len(traindf))
print('Test set - ', len(testdf))
# Splitting into X and y arrays for preprocessing purposes
Xtrain, ytrain = traindf[inputc], traindf[outputc]
Xtest, ytest = testdf[inputc], testdf[outputc]
Scaling the values¶
# Each feature has a different range.
# Using min_max_scaler to scale them to values in the range [0,1].
min_max_scaler = MinMaxScaler()
# Fit on training set alone
Xtrain = min_max_scaler.fit_transform(Xtrain)
# Use it to transform val and test input
Xtest = min_max_scaler.transform(Xtest)
The model¶
model = Sequential([
Dense(128, activation = 'relu'),
Dense(64, activation = 'relu'),
Dense(3, activation = 'softmax'),
])
# training with a learning rate of 0.01
model.compile(optimizer = Adam(0.01), loss = 'categorical_crossentropy', metrics = ['accuracy'])
history1 = model.fit(Xtrain, ytrain, validation_data= (Xtest, ytest), epochs = 64)
# training with learning rate of 0.001
model.compile(optimizer = Adam(0.001), loss = 'categorical_crossentropy', metrics = ['accuracy'])
history2 = model.fit(Xtrain, ytrain, validation_data= (Xtest, ytest), epochs = 64)
model.summary()
model.evaluate(Xtest, ytest)
Plotting the metrics¶
def plot(history1, history2, variable1, variable2):
# combining metrics from both trainings
var1_history = history1[variable1]
var1_history.extend(history2[variable1])
var2_history = history1[variable2]
var2_history.extend(history2[variable2])
# plotting them
plt.plot(range(len(var1_history)), var1_history)
plt.plot(range(len(var2_history)), var2_history)
plt.legend([variable1, variable2])
plt.title(variable1)
plot(history1.history, history2.history, "accuracy", 'val_accuracy')
plot(history1.history, history2.history, "loss", 'val_loss')
Prediction¶
# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)
output = model.predict(Xtest[x].reshape(1, -1)) # getting output; input shape (256, 256, 3) --> (1, 256, 256, 3)
pred = np.argmax(output[0]) # finding max
print("Prdicted: ", class_names[pred]) # Picking the label from class_names base don the model output
output_true = np.array(ytest)[x]
print("True: ", class_names[np.argmax(output_true)])
print("Probability: ", output[0][pred])
deepC¶
model.save('fetal_health.h5')
!deepCC fetal_health.h5
# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)
np.savetxt('sample.data', Xtest[x]) # xth sample into text file
# run exe with input
!fetal_health_deepC/fetal_health.exe sample.data
# show predicted output
nn_out = np.loadtxt('deepSea_result_1.out')
pred = np.argmax(nn_out) # finding max
print("Prdicted: ", class_names[pred]) # Picking the label from class_names base don the model output
output_true = np.array(ytest)[x]
print("True: ", class_names[np.argmax(output_true)])
print("Probability: ", nn_out[pred])