Cainvas
Model Files
epileptic_seizure.h5
keras
Model
deepSea Compiled Models
epileptic_seizure.exe
deepSea
Ubuntu

Epileptic Seizure Recognition

Credit: AITS Cainvas Community

Photo by Epilepsy Foundation on YouTube

A sudden rush of electric activity in the brain is called a seizure. Epilepsy is a chronic neurological disorder causing involuntory, recurrent seizures.

Seizures can either be generalized (affecting the whole brain) or focussed (affecting one part of the brain).

Deep learning can be used to detect and monitor seizures in patients and IoT provides an ease of integration with the existing health system and patient wearability.

In [1]:
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
import random
import matplotlib.pyplot as plt

The dataset

Andrzejak RG, Lehnertz K, Rieke C, Mormann F, David P, Elger CE (2001) Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Phys. Rev. E, 64, 061907

The CSV file consists of processed EEG recordings of patients at different points of time.

The original dataset has 5 different folders, each with 100 files, with each file representing a single subject/person. Each file is a recording of brain activity for 23.6 seconds. The corresponding time-series is sampled into 4097 data points. This was then divided and shuffled into 23 chunks, each chunk containing 178 data points for 1 second, and each data point is the value of the EEG recording at a different point in time. Thus, we have 23 x 500 = 11500 pieces of information, each information contains 178 data points for 1 second.

The last column contains the categorical variable with the following values -

  • 5 - Recording the EEG signal of the brain when the patient had their eyes open

  • 4 - Recording the EEG signal when the patient had their eyes closed

  • 3 - Recording the EEG activity from the healthy brain area

  • 2 - Recordering the EEG from the area where the tumor was located

  • 1 - Recording of seizure activity

Here, only label 1 corresponds to seizure activity.

We will be training the model to identify patients with seizure activity against the rest of the classes.

In [2]:
df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/Epileptic_Seizure_Recognition.csv')
df
Out[2]:
Unnamed X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X170 X171 X172 X173 X174 X175 X176 X177 X178 y
0 X21.V1.791 135 190 229 223 192 125 55 -9 -33 ... -17 -15 -31 -77 -103 -127 -116 -83 -51 4
1 X15.V1.924 386 382 356 331 320 315 307 272 244 ... 164 150 146 152 157 156 154 143 129 1
2 X8.V1.1 -32 -39 -47 -37 -32 -36 -57 -73 -85 ... 57 64 48 19 -12 -30 -35 -35 -36 5
3 X16.V1.60 -105 -101 -96 -92 -89 -95 -102 -100 -87 ... -82 -81 -80 -77 -85 -77 -72 -69 -65 5
4 X20.V1.54 -9 -65 -98 -102 -78 -48 -16 0 -21 ... 4 2 -12 -32 -41 -65 -83 -89 -73 5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11495 X22.V1.114 -22 -22 -23 -26 -36 -42 -45 -42 -45 ... 15 16 12 5 -1 -18 -37 -47 -48 2
11496 X19.V1.354 -47 -11 28 77 141 211 246 240 193 ... -65 -33 -7 14 27 48 77 117 170 1
11497 X8.V1.28 14 6 -13 -16 10 26 27 -9 4 ... -65 -48 -61 -62 -67 -30 -2 -1 -8 5
11498 X10.V1.932 -40 -25 -9 -12 -2 12 7 19 22 ... 121 135 148 143 116 86 68 59 55 3
11499 X16.V1.210 29 41 57 72 74 62 54 43 31 ... -59 -25 -4 2 5 4 -2 2 20 4

11500 rows × 180 columns

As we will be classifying samples into two categories, epileptic (label 1) and non-epileptic (labels 2, 3, 4, 5), we will change the labels in the dataframe.

In [3]:
df['y'] = (df['y'] ==1).astype('int')
df['y']
Out[3]:
0        0
1        1
2        0
3        0
4        0
        ..
11495    0
11496    1
11497    0
11498    0
11499    0
Name: y, Length: 11500, dtype: int64
In [4]:
# The spread of labels in the dataframe
df['y'].value_counts()
Out[4]:
0    9200
1    2300
Name: y, dtype: int64

This is an unbalanced dataest.

In [5]:
# Defining a list with class names corresponding to list indices
class_names = ['Non-epileptic', 'Epileptic']

Preprocessing

Resampling

In order to balance the dataset, there are two options,

  • upsampling - resample the values to make their count equal to the class label with the higher count (here, 9200).
  • downsampling - pick n samples from each class label where n = number of samples in class with least count (here, 2300)

Here, we will be upsampling.

In [6]:
# separating into 2 dataframes, one for each class 

df1 = df[df['y'] == 1]
df0 = df[df['y'] == 0]
In [7]:
print("Number of samples in:")
print("Class label 0 - ", len(df0))
print("Class label 1 - ", len(df1))

# Upsampling 

df1 = df1.sample(len(df0), replace = True)    # replace = True enables resampling

print('\nAfter resampling - ')

print("Number of samples in:")
print("Class label 0 - ", len(df0))
print("Class label 1 - ", len(df1))
Number of samples in:
Class label 0 -  9200
Class label 1 -  2300

After resampling - 
Number of samples in:
Class label 0 -  9200
Class label 1 -  9200
In [8]:
# concatente to form a single dataframe

df = df0.append(df1)

print('Total number of samples - ', len(df))
Total number of samples -  18400
In [9]:
# defining the input and output columns to separate the dataset in the later cells.

input_columns = list(df.columns[1:-1])    # exculding the first 'Unnamed' column
output_columns = list(df.columns[-1])

print("Number of input columns: ", len(input_columns))
#print("Input columns: ", ', '.join(input_columns))

print("Number of output columns: ", len(output_columns))
#print("Output columns: ", ', '.join(output_columns))
Number of input columns:  178
Number of output columns:  1

Train test split

In [10]:
# Splitting into train, val and test set -- 80-10-10 split

# First, an 80-20 split
train_df, val_test_df = train_test_split(df, test_size = 0.2)

# Then split the 20% into half
val_df, test_df = train_test_split(val_test_df, test_size = 0.5)

print("Number of samples in...")
print("Training set: ", len(train_df))
print("Validation set: ", len(val_df))
print("Testing set: ", len(test_df))
Number of samples in...
Training set:  14720
Validation set:  1840
Testing set:  1840
In [11]:
# Splitting into X (input) and y (output)

Xtrain, ytrain = np.array(train_df[input_columns]), np.array(train_df[output_columns])

Xval, yval = np.array(val_df[input_columns]), np.array(val_df[output_columns])

Xtest, ytest = np.array(test_df[input_columns]), np.array(test_df[output_columns])

Scaling

In [12]:
df.describe()
Out[12]:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 ... X170 X171 X172 X173 X174 X175 X176 X177 X178 y
count 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 ... 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000 18400.000000
mean -16.578750 -15.004674 -13.156522 -10.387174 -7.266413 -3.701739 -1.555054 -1.283207 -0.629185 0.599022 ... -9.308859 -11.912554 -15.100924 -17.349293 -17.983967 -18.297065 -17.969402 -17.450489 -16.703967 0.500000
std 248.426173 247.695072 242.379106 238.337371 239.944693 242.051001 242.345105 240.873068 239.095047 237.045917 ... 246.741940 249.821872 253.600021 252.445706 250.598385 246.111469 242.207385 240.759606 243.354929 0.500014
min -1839.000000 -1838.000000 -1835.000000 -1687.000000 -1755.000000 -1757.000000 -1832.000000 -1778.000000 -1840.000000 -1867.000000 ... -1867.000000 -1865.000000 -1642.000000 -1723.000000 -1866.000000 -1863.000000 -1781.000000 -1727.000000 -1829.000000 0.000000
25% -83.000000 -83.000000 -82.000000 -81.000000 -79.000000 -76.000000 -74.000000 -73.000000 -73.000000 -72.000000 ... -81.000000 -80.000000 -80.000000 -82.000000 -82.000000 -82.000000 -83.000000 -82.000000 -84.000000 0.000000
50% -10.000000 -9.000000 -8.000000 -8.000000 -9.000000 -8.000000 -7.000000 -7.000000 -5.000000 -4.000000 ... -9.000000 -10.000000 -10.000000 -10.000000 -10.000000 -9.000000 -10.000000 -11.000000 -11.000000 0.500000
75% 56.000000 57.000000 58.000000 59.000000 61.000000 63.000000 62.000000 64.250000 66.000000 66.250000 ... 59.000000 58.000000 57.000000 57.000000 58.000000 57.000000 56.000000 56.000000 54.000000 1.000000
max 1726.000000 1713.000000 1697.000000 1612.000000 1518.000000 1816.000000 2047.000000 2047.000000 2047.000000 2047.000000 ... 1777.000000 1472.000000 1319.000000 1436.000000 1733.000000 1958.000000 2047.000000 2047.000000 1915.000000 1.000000

8 rows × 179 columns

All the features have similar range of values. But they are skewed differently as their mean values indicate.

In [13]:
# Using standard scaler to standardize them to values with mean = 0 and variance = 1.

standard_scaler = StandardScaler()

# Fit on training set alone
Xtrain = standard_scaler.fit_transform(Xtrain)

# Use it to transform val and test input
Xval = standard_scaler.transform(Xval)
Xtest = standard_scaler.transform(Xtest)

The model

In [14]:
model = Sequential([
    Dense(512, activation = 'relu', input_shape = Xtrain[0].shape),
    Dense(256, activation = 'relu'),
    Dense(128, activation = 'relu'),
    Dense(1, activation = 'sigmoid')
])

cb = [EarlyStopping(monitor = 'val_loss', patience = 3, restore_best_weights = True)]
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 512)               91648     
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_2 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 129       
=================================================================
Total params: 256,001
Trainable params: 256,001
Non-trainable params: 0
_________________________________________________________________
In [15]:
model.compile(optimizer=Adam(0.01), loss='binary_crossentropy', metrics=['accuracy'])

history1 = model.fit(Xtrain, ytrain, validation_data = (Xval, yval), epochs=16, callbacks = cb)
Epoch 1/16
460/460 [==============================] - 1s 2ms/step - loss: 0.1925 - accuracy: 0.9365 - val_loss: 0.1421 - val_accuracy: 0.9636
Epoch 2/16
460/460 [==============================] - 1s 1ms/step - loss: 0.1609 - accuracy: 0.9567 - val_loss: 0.1116 - val_accuracy: 0.9685
Epoch 3/16
460/460 [==============================] - 1s 1ms/step - loss: 0.1026 - accuracy: 0.9677 - val_loss: 0.1181 - val_accuracy: 0.9489
Epoch 4/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0824 - accuracy: 0.9730 - val_loss: 0.0920 - val_accuracy: 0.9707
Epoch 5/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0732 - accuracy: 0.9774 - val_loss: 0.0825 - val_accuracy: 0.9647
Epoch 6/16
460/460 [==============================] - 1s 1ms/step - loss: 0.1354 - accuracy: 0.9671 - val_loss: 0.1061 - val_accuracy: 0.9663
Epoch 7/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0971 - accuracy: 0.9717 - val_loss: 0.0854 - val_accuracy: 0.9755
Epoch 8/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0647 - accuracy: 0.9796 - val_loss: 0.0696 - val_accuracy: 0.9783
Epoch 9/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0502 - accuracy: 0.9825 - val_loss: 0.0960 - val_accuracy: 0.9766
Epoch 10/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0569 - accuracy: 0.9827 - val_loss: 0.0785 - val_accuracy: 0.9766
Epoch 11/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0614 - accuracy: 0.9813 - val_loss: 0.0761 - val_accuracy: 0.9826
In [ ]:
model.compile(optimizer=Adam(0.001), loss='binary_crossentropy', metrics=['accuracy'])

history2 = model.fit(Xtrain, ytrain, validation_data = (Xval, yval), epochs=16, callbacks = cb)
Epoch 1/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0281 - accuracy: 0.9906 - val_loss: 0.0653 - val_accuracy: 0.9826
Epoch 2/16
460/460 [==============================] - 1s 1ms/step - loss: 0.0210 - accuracy: 0.9928 - val_loss: 0.0766 - val_accuracy: 0.9832
In [ ]:
model.evaluate(Xtest, ytest)

Plotting the metrics

In [ ]:
def plot(history1, history2, variable1, variable2):
    # combining metrics from both trainings    
    var1_history = history1[variable1]
    var1_history.extend(history2[variable1])
    
    var2_history = history1[variable2]
    var2_history.extend(history2[variable2])
    
    # plotting them
    plt.plot(range(len(var1_history)), var1_history)
    plt.plot(range(len(var2_history)), var2_history)
    plt.legend([variable1, variable2])
    plt.title(variable1)
In [ ]:
plot(history1.history, history2.history, "accuracy", 'val_accuracy')
In [ ]:
plot(history1.history, history2.history, "loss", 'val_loss')

Model evaluation

In [ ]:
cm = confusion_matrix(ytest, (model.predict(Xtest)>0.5).astype('int'))
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

for i in range(cm.shape[1]):
    for j in range(cm.shape[0]):
        plt.text(j, i, format(cm[i, j], '.2f'), horizontalalignment="center", color="black")


plt.imshow(cm, cmap=plt.cm.Blues)

Balancing the dataset is an important step in achieving high performance.

It is important to note that good results, if not the best can be achieved with an unbalanced dataset too! Try running the same notebook, but skip the resampling section code and see the results!

Prediction

In [ ]:
# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)

output_true = np.array(ytest)[x][0]
print("True: ", class_names[output_true])

output = model.predict(Xtest[x].reshape(1, -1))[0][0]
pred = int(output>0.5)    # finding max
print("Predicted: ", class_names[pred], "(",output, "-->", pred, ")")    # Picking the label from class_names base don the model output

deepC

In [ ]:
model.save('epileptic_seizure.h5')

!deepCC epileptic_seizure.h5
In [ ]:
# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)

output_true = np.array(ytest)[x][0]
print("True: ", class_names[output_true])

np.savetxt('sample.data', Xtest[i])

# run exe with input
!epileptic_seizure_deepC/epileptic_seizure.exe sample.data
# Picking the label from class_names base don the model output