Cainvas
Model Files
heartbeat_anomaly.h5
keras
Model
deepSea Compiled Models
heartbeat_anomaly.exe
deepSea
Ubuntu

Heartbeat Anomaly Detection

Credit: AITS Cainvas Community

Photo on Gifer

According to WHO 17.9 million people die each year due to Cardiovascular Diseases.Over the years it has been found that these deaths can be prevented if the diseases are diagnosed in early stages.AI has brought a major development in the field of healthcare for early diagnosis of these diseases.

This model if coupled with digital stethoscopes or some similar IoT Device can help in the detection of anomalies in the Heartbeat Sounds of an individual.

Importing the Dataset

In [1]:
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/heart.zip"
!unzip -qo heart.zip 
!rm heart.zip
--2020-10-27 07:59:09--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/heart.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.92
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 72125176 (69M) [application/zip]
Saving to: ‘heart.zip’

heart.zip           100%[===================>]  68.78M  97.7MB/s    in 0.7s    

2020-10-27 07:59:10 (97.7 MB/s) - ‘heart.zip’ saved [72125176/72125176]

Importing Necessary Libraries

In [2]:
# Pandas
import pandas as pd

# Scikit learn
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.utils import class_weight

# Keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.utils import to_categorical
from keras.optimizers import Adam

# Audio
import librosa
import librosa.display

# Plot
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'

# Utility
import os
import glob
import numpy as np
from tqdm import tqdm
import itertools

# To ignore any warnings
import warnings                       
warnings.filterwarnings("ignore")


# gather software versions
import tensorflow as tf; print('tensorflow version: ', tf.__version__)
import keras; print('keras version: ',keras.__version__)

# If any warning pops up run the cell again.There is nothing to worry about.
Populating the interactive namespace from numpy and matplotlib
tensorflow version:  2.3.1
keras version:  2.4.3
/opt/tljh/user/lib/python3.7/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['shuffle']
`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"

Build Dataset

In [3]:
dataset = []
for folder in ["heart/set_a/**"]:
    for filename in glob.iglob(folder):
        if os.path.exists(filename):
            label = os.path.basename(filename).split("_")[0]
            # skip audio smaller than 4 secs
            if librosa.get_duration(filename=filename)>=4:
                if label not in ["Aunlabelledtest"]:
                    dataset.append({
                        "filename": filename,
                        "label": label
                    })
dataset = pd.DataFrame(dataset)

Exploratory Data Analysis

In [4]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93 entries, 0 to 92
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   filename  93 non-null     object
 1   label     93 non-null     object
dtypes: object(2)
memory usage: 1.6+ KB
In [5]:
plt.figure(figsize=(12,6))
dataset.label.value_counts().plot(kind='bar', title="Dataset distribution")
plt.show()
In [6]:
# parent folder of sound files
INPUT_DIR="heart"
# 16 KHz
SAMPLE_RATE = 16000
# seconds
MAX_SOUND_CLIP_DURATION=12 
In [7]:
set_a=pd.read_csv(INPUT_DIR+"/set_a.csv")
set_a.head()
set_a.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   dataset   176 non-null    object 
 1   fname     176 non-null    object 
 2   label     124 non-null    object 
 3   sublabel  0 non-null      float64
dtypes: float64(1), object(3)
memory usage: 5.6+ KB
In [8]:
train_ab=set_a
train_ab.describe()
Out[8]:
sublabel
count 0.0
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
In [9]:
#get all unique labels
nb_classes=train_ab.label.unique()

print("Number of training examples=", train_ab.shape[0], "  Number of classes=", len(train_ab.label.unique()))
print (nb_classes)
Number of training examples= 176   Number of classes= 5
['artifact' 'extrahls' 'murmur' 'normal' nan]
In [10]:
print('Minimum samples per category = ', min(train_ab.label.value_counts()))
print('Maximum samples per category = ', max(train_ab.label.value_counts()))
Minimum samples per category =  19
Maximum samples per category =  40

Normal Case

In the Normal category there are normal, healthy heart sounds. A normal heart sound has a clear “lub dub, lub dub” pattern, with the time from “lub” to “dub” shorter than the time from “dub” to the next “lub”.

In [11]:
normal_file=INPUT_DIR+"/set_a/normal__201106111136.wav"
In [12]:
# hear it
import IPython.display as ipd
ipd.Audio(normal_file) 
Out[12]:
In [13]:
# Load use wave 
import wave
wav = wave.open(normal_file)
print("Sampling (frame) rate = ", wav.getframerate())
print("Total samples (frames) = ", wav.getnframes())
print("Duration = ", wav.getnframes()/wav.getframerate())
Sampling (frame) rate =  44100
Total samples (frames) =  218903
Duration =  4.963786848072562
In [14]:
# Load using Librosa
y, sr = librosa.load(normal_file, duration=5)   #default sampling rate is 22 HZ
dur=librosa.get_duration(y)
print ("duration:", dur)
print(y.shape, sr)
duration: 4.963809523809524
(109452,) 22050
In [15]:
# librosa plot
plt.figure(figsize=(16, 3))
librosa.display.waveplot(y, sr=sr)
Out[15]:
<matplotlib.collections.PolyCollection at 0x7f55eeaceb00>

Murmur Case

Heart murmurs sound as though there is a “whooshing, roaring, rumbling, or turbulent fluid” noise in one of two temporal locations: (1) between “lub” and “dub”, or (2) between “dub” and “lub”. They can be a symptom of many heart disorders, some serious. There will still be a “lub” and a “dub”.

In [16]:
# murmur case
murmur_file=INPUT_DIR+"/set_a/murmur__201108222231.wav"
y2, sr2 = librosa.load(murmur_file,duration=5)
dur=librosa.get_duration(y)
print ("duration:", dur)
print(y2.shape,sr2)
duration: 4.963809523809524
(110250,) 22050
In [17]:
# heart it
import IPython.display as ipd
ipd.Audio(murmur_file) 
Out[17]: