Cainvas

RAIN PREDICTION

Credit: AITS Cainvas Community

Photo by LISTENXU on Dribbble

TABLE OF CONTENTS

  1. IMPORTING LIBRARIES

  2. LOADING DATA

  3. DATA VISUALIZATION AND CLEANINGS

  4. DATA PREPROCESSING

  5. MODEL BUILDING

  6. CONCLUSION

  7. END

IMPORTING LIBRARIES

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import seaborn as sns
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score
from tensorflow.keras import callbacks
import numpy as np
import pandas as pd
np.random.seed(0)

LOADING DATA

In [2]:
data = pd.read_csv("https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/weatherAUS_N9NfQul.csv")
data.head()
Out[2]:
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed WindDir9am ... Humidity9am Humidity3pm Pressure9am Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RainTomorrow
0 2008-12-01 Albury 13.4 22.9 0.6 NaN NaN W 44.0 W ... 71.0 22.0 1007.7 1007.1 8.0 NaN 16.9 21.8 No No
1 2008-12-02 Albury 7.4 25.1 0.0 NaN NaN WNW 44.0 NNW ... 44.0 25.0 1010.6 1007.8 NaN NaN 17.2 24.3 No No
2 2008-12-03 Albury 12.9 25.7 0.0 NaN NaN WSW 46.0 W ... 38.0 30.0 1007.6 1008.7 NaN 2.0 21.0 23.2 No No
3 2008-12-04 Albury 9.2 28.0 0.0 NaN NaN NE 24.0 SE ... 45.0 16.0 1017.6 1012.8 NaN NaN 18.1 26.5 No No
4 2008-12-05 Albury 17.5 32.3 1.0 NaN NaN W 41.0 ENE ... 82.0 33.0 1010.8 1006.0 7.0 8.0 17.8 29.7 No No

5 rows × 23 columns

About the data:

The dataset contains about 10 years of daily weather observations from different locations across Australia. Observations were drawn from numerous weather stations.

In this project, I will use this data to predict whether or not it will rain the next day. There are 23 attributes including the target variable "RainTomorrow", indicating whether or not it will rain the next day or not.

In [3]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           145460 non-null  object 
 1   Location       145460 non-null  object 
 2   MinTemp        143975 non-null  float64
 3   MaxTemp        144199 non-null  float64
 4   Rainfall       142199 non-null  float64
 5   Evaporation    82670 non-null   float64
 6   Sunshine       75625 non-null   float64
 7   WindGustDir    135134 non-null  object 
 8   WindGustSpeed  135197 non-null  float64
 9   WindDir9am     134894 non-null  object 
 10  WindDir3pm     141232 non-null  object 
 11  WindSpeed9am   143693 non-null  float64
 12  WindSpeed3pm   142398 non-null  float64
 13  Humidity9am    142806 non-null  float64
 14  Humidity3pm    140953 non-null  float64
 15  Pressure9am    130395 non-null  float64
 16  Pressure3pm    130432 non-null  float64
 17  Cloud9am       89572 non-null   float64
 18  Cloud3pm       86102 non-null   float64
 19  Temp9am        143693 non-null  float64
 20  Temp3pm        141851 non-null  float64
 21  RainToday      142199 non-null  object 
 22  RainTomorrow   142193 non-null  object 
dtypes: float64(16), object(7)
memory usage: 25.5+ MB

DATA VISUALIZATION AND CLEANING

Steps involves in this section:

Count plot of target column

Correlation amongst numeric attributes

Parse Dates into datetime

Encoding days and months as continuous cyclic features

In [4]:
#first of all let us evaluate the target and find out if our data is imbalanced or not
cols= ["#C2C4E2","#EED4E5"]
sns.countplot(x= data["RainTomorrow"], palette= cols)
Out[4]:
<AxesSubplot:xlabel='RainTomorrow', ylabel='count'>
In [ ]:
 
In [5]:
# Correlation amongst numeric attributes
plt.figure(figsize=(10,10))
corrmat = data.corr()
cmap = sns.diverging_palette(260,-10,s=50, l=75, n=6, as_cmap=True)
plt.subplots(figsize=(18,18))
sns.heatmap(corrmat,cmap= cmap,annot=True, square=True,fmt="%")
Out[5]:
<AxesSubplot:>
<Figure size 720x720 with 0 Axes>
In [6]:
#Parsing datetime
#exploring the length of date objects
lengths = data["Date"].str.len()
lengths.value_counts()
Out[6]:
10    145460
Name: Date, dtype: int64
In [7]:
data['Date']= pd.to_datetime(data["Date"])
#Creating a collumn of year
data['year'] = data.Date.dt.year

# function to encode datetime into cyclic parameters. 
#As I am planning to use this data in a neural network I prefer the months and days in a cyclic continuous feature. 

def encode(data, col, max_val):
    data[col + '_sin'] = np.sin(2 * np.pi * data[col]/max_val)
    data[col + '_cos'] = np.cos(2 * np.pi * data[col]/max_val)
    return data

data['month'] = data.Date.dt.month
data = encode(data, 'month', 12)

data['day'] = data.Date.dt.day
data = encode(data, 'day', 31)

data.head()
Out[7]:
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed WindDir9am ... Temp3pm RainToday RainTomorrow year month month_sin month_cos day day_sin day_cos
0 2008-12-01 Albury 13.4 22.9 0.6 NaN NaN W 44.0 W ... 21.8 No No 2008 12 -2.449294e-16 1.0 1 0.201299 0.979530
1 2008-12-02 Albury 7.4 25.1 0.0 NaN NaN WNW 44.0 NNW ... 24.3 No No 2008 12 -2.449294e-16 1.0 2 0.394356 0.918958
2 2008-12-03 Albury 12.9 25.7 0.0 NaN NaN WSW 46.0 W ... 23.2 No No 2008 12 -2.449294e-16 1.0 3 0.571268 0.820763
3 2008-12-04 Albury 9.2 28.0 0.0 NaN NaN NE 24.0 SE ... 26.5 No No 2008 12 -2.449294e-16 1.0 4 0.724793 0.688967
4 2008-12-05 Albury 17.5 32.3 1.0 NaN NaN W 41.0 ENE ... 29.7 No No 2008 12 -2.449294e-16 1.0 5 0.848644 0.528964

5 rows × 30 columns

In [ ]:
 
In [8]:
cyclic_month = sns.scatterplot(x="month_sin",y="month_cos",data=data, color="#C2C4E2")
cyclic_month.set_title("Cyclic Encoding of Month")
cyclic_month.set_ylabel("Cosine Encoded Months")
cyclic_month.set_xlabel("Sine Encoded Months")
Out[8]:
Text(0.5, 0, 'Sine Encoded Months')
In [9]:
cyclic_day = sns.scatterplot(x='day_sin',y='day_cos',data=data, color="#C2C4E2")
cyclic_day.set_title("Cyclic Encoding of Day")
cyclic_day.set_ylabel("Cosine Encoded Day")
cyclic_day.set_xlabel("Sine Encoded Day")
Out[9]:
Text(0.5, 0, 'Sine Encoded Day')
In [10]:
# Get list of categorical variables
s = (data.dtypes == "object")
object_cols = list(s[s].index)

print("Categorical variables:")
print(object_cols)
Categorical variables:
['Location', 'WindGustDir', 'WindDir9am', 'WindDir3pm', 'RainToday', 'RainTomorrow']
In [11]:
# Missing values in categorical variables

for i in object_cols:
    print(i, data[i].isnull().sum())
Location 0
WindGustDir 10326
WindDir9am 10566
WindDir3pm 4228
RainToday 3261
RainTomorrow 3267
In [12]:
for i in object_cols:
    data[i].fillna(data[i].mode()[0], inplace=True)
In [13]:
# Get list of neumeric variables
t = (data.dtypes == "float64")
num_cols = list(t[t].index)

print("Neumeric variables:")
print(num_cols)
Neumeric variables:
['MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation', 'Sunshine', 'WindGustSpeed', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am', 'Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Cloud9am', 'Cloud3pm', 'Temp9am', 'Temp3pm', 'month_sin', 'month_cos', 'day_sin', 'day_cos']
In [14]:
# Missing values in numeric variables

for i in num_cols:
    print(i, data[i].isnull().sum())
MinTemp 1485
MaxTemp 1261
Rainfall 3261
Evaporation 62790
Sunshine 69835
WindGustSpeed 10263
WindSpeed9am 1767
WindSpeed3pm 3062
Humidity9am 2654
Humidity3pm 4507
Pressure9am 15065
Pressure3pm 15028
Cloud9am 55888
Cloud3pm 59358
Temp9am 1767
Temp3pm 3609
month_sin 0
month_cos 0
day_sin 0
day_cos 0

Numerical variables

Filling missing values with median of the column value

In [15]:
# Filling missing values with median of the column in value

for i in num_cols:
    data[i].fillna(data[i].median(), inplace=True)
    
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 30 columns):
 #   Column         Non-Null Count   Dtype         
---  ------         --------------   -----         
 0   Date           145460 non-null  datetime64[ns]
 1   Location       145460 non-null  object        
 2   MinTemp        145460 non-null  float64       
 3   MaxTemp        145460 non-null  float64       
 4   Rainfall       145460 non-null  float64       
 5   Evaporation    145460 non-null  float64       
 6   Sunshine       145460 non-null  float64       
 7   WindGustDir    145460 non-null  object        
 8   WindGustSpeed  145460 non-null  float64       
 9   WindDir9am     145460 non-null  object        
 10  WindDir3pm     145460 non-null  object        
 11  WindSpeed9am   145460 non-null  float64       
 12  WindSpeed3pm   145460 non-null  float64       
 13  Humidity9am    145460 non-null  float64       
 14  Humidity3pm    145460 non-null  float64       
 15  Pressure9am    145460 non-null  float64       
 16  Pressure3pm    145460 non-null  float64       
 17  Cloud9am       145460 non-null  float64       
 18  Cloud3pm       145460 non-null  float64       
 19  Temp9am        145460 non-null  float64       
 20  Temp3pm        145460 non-null  float64       
 21  RainToday      145460 non-null  object        
 22  RainTomorrow   145460 non-null  object        
 23  year           145460 non-null  int64         
 24  month          145460 non-null  int64         
 25  month_sin      145460 non-null  float64       
 26  month_cos      145460 non-null  float64       
 27  day            145460 non-null  int64         
 28  day_sin        145460 non-null  float64       
 29  day_cos        145460 non-null  float64       
dtypes: datetime64[ns](1), float64(20), int64(3), object(6)
memory usage: 33.3+ MB

DATA PREPROCESSING

Steps involved in Data Preprocessing:

Label encoding columns with categorical data
Perform the scaling of the features
Detecting outliers
Dropping the outliers based on data analysis
Label encoding the catagorical varabl
In [16]:
# Apply label encoder to each column with categorical data
label_encoder = LabelEncoder()
for i in object_cols:
    data[i] = label_encoder.fit_transform(data[i])
    
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 30 columns):
 #   Column         Non-Null Count   Dtype         
---  ------         --------------   -----         
 0   Date           145460 non-null  datetime64[ns]
 1   Location       145460 non-null  int64         
 2   MinTemp        145460 non-null  float64       
 3   MaxTemp        145460 non-null  float64       
 4   Rainfall       145460 non-null  float64       
 5   Evaporation    145460 non-null  float64       
 6   Sunshine       145460 non-null  float64       
 7   WindGustDir    145460 non-null  int64         
 8   WindGustSpeed  145460 non-null  float64       
 9   WindDir9am     145460 non-null  int64         
 10  WindDir3pm     145460 non-null  int64         
 11  WindSpeed9am   145460 non-null  float64       
 12  WindSpeed3pm   145460 non-null  float64       
 13  Humidity9am    145460 non-null  float64       
 14  Humidity3pm    145460 non-null  float64       
 15  Pressure9am    145460 non-null  float64       
 16  Pressure3pm    145460 non-null  float64       
 17  Cloud9am       145460 non-null  float64       
 18  Cloud3pm       145460 non-null  float64       
 19  Temp9am        145460 non-null  float64       
 20  Temp3pm        145460 non-null  float64       
 21  RainToday      145460 non-null  int64         
 22  RainTomorrow   145460 non-null  int64         
 23  year           145460 non-null  int64         
 24  month          145460 non-null  int64         
 25  month_sin      145460 non-null  float64       
 26  month_cos      145460 non-null  float64       
 27  day            145460 non-null  int64         
 28  day_sin        145460 non-null  float64       
 29  day_cos        145460 non-null  float64       
dtypes: datetime64[ns](1), float64(20), int64(9)
memory usage: 33.3 MB
In [17]:
features = data.drop(['RainTomorrow', 'Date','day', 'month'], axis=1) # dropping target and extra columns

target = data['RainTomorrow']

#Set up a standard scaler for the features
col_names = list(features.columns)
s_scaler = preprocessing.StandardScaler()
features = s_scaler.fit_transform(features)
features = pd.DataFrame(features, columns=col_names) 

features.describe().T
Out[17]:
count mean std min 25% 50% 75% max
Location 145460.0 7.815677e-18 1.000003 -1.672228 -0.899139 0.014511 0.857881 1.701250
MinTemp 145460.0 -4.501830e-16 1.000003 -3.250525 -0.705659 -0.030170 0.723865 3.410112
MaxTemp 145460.0 3.001220e-16 1.000003 -3.952405 -0.735852 -0.086898 0.703133 3.510563
Rainfall 145460.0 7.815677e-18 1.000003 -0.275097 -0.275097 -0.275097 -0.203581 43.945571
Evaporation 145460.0 -3.282584e-17 1.000003 -1.629472 -0.371139 -0.119472 0.006361 43.985108
Sunshine 145460.0 -5.424080e-16 1.000003 -2.897217 0.076188 0.148710 0.257494 2.360634
WindGustDir 145460.0 6.252542e-18 1.000003 -1.724209 -0.872075 0.193094 1.045228 1.471296
WindGustSpeed 145460.0 1.824961e-16 1.000003 -2.588407 -0.683048 -0.073333 0.460168 7.243246
WindDir9am 145460.0 7.190423e-17 1.000003 -1.550000 -0.885669 0.000105 0.885879 1.771653
WindDir3pm 145460.0 8.284618e-17 1.000003 -1.718521 -0.837098 0.044324 0.925747 1.586813
WindSpeed9am 145460.0 5.627287e-17 1.000003 -1.583291 -0.793380 -0.116314 0.560752 13.086472
WindSpeed3pm 145460.0 6.565169e-17 1.000003 -2.141841 -0.650449 0.037886 0.611499 7.839016
Humidity9am 145460.0 2.250915e-16 1.000003 -3.654212 -0.631189 0.058273 0.747734 1.649338
Humidity3pm 145460.0 -8.440931e-17 1.000003 -2.518329 -0.710918 0.021816 0.656852 2.366565
Pressure9am 145460.0 -4.314254e-16 1.000003 -5.520544 -0.616005 -0.006653 0.617561 3.471111
Pressure3pm 145460.0 5.027043e-15 1.000003 -5.724832 -0.622769 -0.007520 0.622735 3.653960
Cloud9am 145460.0 -1.016038e-16 1.000003 -2.042425 -0.727490 0.149133 0.587445 1.902380
Cloud3pm 145460.0 7.346736e-17 1.000003 -2.235619 -0.336969 0.137693 0.612356 2.036343
Temp9am 145460.0 7.503050e-17 1.000003 -3.750358 -0.726764 -0.044517 0.699753 3.599302
Temp3pm 145460.0 -6.877796e-17 1.000003 -3.951301 -0.725322 -0.083046 0.661411 3.653834
RainToday 145460.0 -8.988029e-18 1.000003 -0.529795 -0.529795 -0.529795 -0.529795 1.887521
year 145460.0 2.080221e-14 1.000003 -2.273637 -0.697391 0.090732 0.878855 1.666978
month_sin 145460.0 -4.884798e-17 1.000003 -1.434333 -0.725379 -0.016425 0.692529 1.401483
month_cos 145460.0 -2.745257e-17 1.000003 -1.388032 -1.198979 0.023080 0.728636 1.434192
day_sin 145460.0 3.565903e-18 1.000003 -1.403140 -1.019170 -0.003198 1.012774 1.396744
day_cos 145460.0 -1.413538e-17 1.000003 -1.392587 -1.055520 -0.044639 1.011221 1.455246
In [18]:
#Detecting outliers
#looking at the scaled features
colours = ["#D0DBEE", "#C2C4E2", "#EED4E5", "#D1E6DC", "#BDE2E2"]
plt.figure(figsize=(20,10))
sns.boxenplot(data = features,palette = colours)
plt.xticks(rotation=90)
plt.show()

MODEL BUILDING

Following steps are involved in the model building

Assining X and y the status of attributes and tags

Splitting test and training sets

Initialising the neural network

Defining by adding layers

Compiling the neural network

Train the neural network

In [19]:
#full data for 
features["RainTomorrow"] = target

#Dropping with outlier

features = features[(features["MinTemp"]<2.3)&(features["MinTemp"]>-2.3)]
features = features[(features["MaxTemp"]<2.3)&(features["MaxTemp"]>-2)]
features = features[(features["Rainfall"]<4.5)]
features = features[(features["Evaporation"]<2.8)]
features = features[(features["Sunshine"]<2.1)]
features = features[(features["WindGustSpeed"]<4)&(features["WindGustSpeed"]>-4)]
features = features[(features["WindSpeed9am"]<4)]
features = features[(features["WindSpeed3pm"]<2.5)]
features = features[(features["Humidity9am"]>-3)]
features = features[(features["Humidity3pm"]>-2.2)]
features = features[(features["Pressure9am"]< 2)&(features["Pressure9am"]>-2.7)]
features = features[(features["Pressure3pm"]< 2)&(features["Pressure3pm"]>-2.7)]
features = features[(features["Cloud9am"]<1.8)]
features = features[(features["Cloud3pm"]<2)]
features = features[(features["Temp9am"]<2.3)&(features["Temp9am"]>-2)]
features = features[(features["Temp3pm"]<2.3)&(features["Temp3pm"]>-2)]
In [20]:
X = features.drop(["RainTomorrow"], axis=1)
y = features["RainTomorrow"]

# Splitting test and training sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

X.shape
Out[20]:
(127536, 26)
In [21]:
early_stopping = callbacks.EarlyStopping(
    min_delta=0.001, # minimium amount of change to count as an improvement
    patience=20, # how many epochs to wait before stopping
    restore_best_weights=True,
)

# Initialising the NN
model = Sequential()

# layers

model.add(Dense(units = 32, kernel_initializer = 'uniform', activation = 'relu', input_dim = 26))
model.add(Dense(units = 32, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 16, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.25))
model.add(Dense(units = 8, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
opt = Adam(learning_rate=0.00009)
model.compile(optimizer = opt, loss = 'binary_crossentropy', metrics = ['accuracy'])

# Train the ANN
history = model.fit(X_train, y_train, batch_size = 32, epochs = 150, callbacks=[early_stopping], validation_split=0.2)
Epoch 1/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.4680 - accuracy: 0.7842 - val_loss: 0.3906 - val_accuracy: 0.7860
Epoch 2/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.4073 - accuracy: 0.8092 - val_loss: 0.3803 - val_accuracy: 0.8408
Epoch 3/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3996 - accuracy: 0.8128 - val_loss: 0.3733 - val_accuracy: 0.8427
Epoch 4/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3962 - accuracy: 0.8149 - val_loss: 0.3689 - val_accuracy: 0.8446
Epoch 5/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3950 - accuracy: 0.8141 - val_loss: 0.3676 - val_accuracy: 0.8447
Epoch 6/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3937 - accuracy: 0.8147 - val_loss: 0.3664 - val_accuracy: 0.8443
Epoch 7/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3922 - accuracy: 0.8156 - val_loss: 0.3653 - val_accuracy: 0.8445
Epoch 8/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3916 - accuracy: 0.8152 - val_loss: 0.3651 - val_accuracy: 0.8454
Epoch 9/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3892 - accuracy: 0.8167 - val_loss: 0.3642 - val_accuracy: 0.8449
Epoch 10/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3895 - accuracy: 0.8159 - val_loss: 0.3636 - val_accuracy: 0.8450
Epoch 11/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3870 - accuracy: 0.8174 - val_loss: 0.3632 - val_accuracy: 0.8453
Epoch 12/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3877 - accuracy: 0.8164 - val_loss: 0.3622 - val_accuracy: 0.8452
Epoch 13/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3872 - accuracy: 0.8166 - val_loss: 0.3613 - val_accuracy: 0.8448
Epoch 14/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3873 - accuracy: 0.8167 - val_loss: 0.3606 - val_accuracy: 0.8454
Epoch 15/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3846 - accuracy: 0.8162 - val_loss: 0.3605 - val_accuracy: 0.8446
Epoch 16/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3852 - accuracy: 0.8174 - val_loss: 0.3597 - val_accuracy: 0.8457
Epoch 17/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3853 - accuracy: 0.8161 - val_loss: 0.3599 - val_accuracy: 0.8461
Epoch 18/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3843 - accuracy: 0.8171 - val_loss: 0.3598 - val_accuracy: 0.8454
Epoch 19/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3843 - accuracy: 0.8170 - val_loss: 0.3590 - val_accuracy: 0.8457
Epoch 20/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3847 - accuracy: 0.8166 - val_loss: 0.3586 - val_accuracy: 0.8459
Epoch 21/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3837 - accuracy: 0.8183 - val_loss: 0.3580 - val_accuracy: 0.8453
Epoch 22/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3825 - accuracy: 0.8171 - val_loss: 0.3592 - val_accuracy: 0.8453
Epoch 23/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3836 - accuracy: 0.8177 - val_loss: 0.3580 - val_accuracy: 0.8462
Epoch 24/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3845 - accuracy: 0.8175 - val_loss: 0.3578 - val_accuracy: 0.8466
Epoch 25/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3840 - accuracy: 0.8160 - val_loss: 0.3583 - val_accuracy: 0.8459
Epoch 26/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3822 - accuracy: 0.8169 - val_loss: 0.3574 - val_accuracy: 0.8461
Epoch 27/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3820 - accuracy: 0.8185 - val_loss: 0.3570 - val_accuracy: 0.8468
Epoch 28/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3823 - accuracy: 0.8159 - val_loss: 0.3594 - val_accuracy: 0.8453
Epoch 29/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3819 - accuracy: 0.8169 - val_loss: 0.3567 - val_accuracy: 0.8468
Epoch 30/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3810 - accuracy: 0.8172 - val_loss: 0.3577 - val_accuracy: 0.8455
Epoch 31/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3808 - accuracy: 0.8175 - val_loss: 0.3575 - val_accuracy: 0.8462
Epoch 32/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3826 - accuracy: 0.8159 - val_loss: 0.3570 - val_accuracy: 0.8465
Epoch 33/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3806 - accuracy: 0.8169 - val_loss: 0.3560 - val_accuracy: 0.8461
Epoch 34/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3803 - accuracy: 0.8170 - val_loss: 0.3566 - val_accuracy: 0.8459
Epoch 35/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3803 - accuracy: 0.8180 - val_loss: 0.3567 - val_accuracy: 0.8458
Epoch 36/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3792 - accuracy: 0.8179 - val_loss: 0.3564 - val_accuracy: 0.8458
Epoch 37/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3787 - accuracy: 0.8180 - val_loss: 0.3567 - val_accuracy: 0.8470
Epoch 38/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3804 - accuracy: 0.8165 - val_loss: 0.3556 - val_accuracy: 0.8461
Epoch 39/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3822 - accuracy: 0.8167 - val_loss: 0.3566 - val_accuracy: 0.8462
Epoch 40/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3795 - accuracy: 0.8182 - val_loss: 0.3564 - val_accuracy: 0.8462
Epoch 41/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3805 - accuracy: 0.8178 - val_loss: 0.3564 - val_accuracy: 0.8461
Epoch 42/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3794 - accuracy: 0.8178 - val_loss: 0.3560 - val_accuracy: 0.8455
Epoch 43/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3804 - accuracy: 0.8164 - val_loss: 0.3558 - val_accuracy: 0.8457
Epoch 44/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3804 - accuracy: 0.8176 - val_loss: 0.3553 - val_accuracy: 0.8469
Epoch 45/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3798 - accuracy: 0.8180 - val_loss: 0.3572 - val_accuracy: 0.8460
Epoch 46/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3808 - accuracy: 0.8163 - val_loss: 0.3560 - val_accuracy: 0.8460
Epoch 47/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3773 - accuracy: 0.8175 - val_loss: 0.3557 - val_accuracy: 0.8465
Epoch 48/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3796 - accuracy: 0.8177 - val_loss: 0.3557 - val_accuracy: 0.8461
Epoch 49/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3781 - accuracy: 0.8168 - val_loss: 0.3553 - val_accuracy: 0.8466
Epoch 50/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.3790 - accuracy: 0.8179 - val_loss: 0.3559 - val_accuracy: 0.8465
Epoch 51/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3784 - accuracy: 0.8167 - val_loss: 0.3560 - val_accuracy: 0.8460
Epoch 52/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3776 - accuracy: 0.8169 - val_loss: 0.3559 - val_accuracy: 0.8458
Epoch 53/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3791 - accuracy: 0.8169 - val_loss: 0.3554 - val_accuracy: 0.8468
Epoch 54/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3786 - accuracy: 0.8177 - val_loss: 0.3574 - val_accuracy: 0.8453
Epoch 55/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3767 - accuracy: 0.8185 - val_loss: 0.3558 - val_accuracy: 0.8464
Epoch 56/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3765 - accuracy: 0.8183 - val_loss: 0.3554 - val_accuracy: 0.8472
Epoch 57/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3778 - accuracy: 0.8169 - val_loss: 0.3565 - val_accuracy: 0.8465
Epoch 58/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3762 - accuracy: 0.8179 - val_loss: 0.3549 - val_accuracy: 0.8473
In [22]:
history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['loss']], "#BDE2E2", label='Training loss')
plt.plot(history_df.loc[:, ['val_loss']],"#C2C4E2", label='Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc="best")

plt.show()
In [23]:
history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['accuracy']], "#BDE2E2", label='Training accuracy')
plt.plot(history_df.loc[:, ['val_accuracy']], "#C2C4E2", label='Validation accuracy')

plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

CONCLUSIONS

Concluding the model with:

Testing on the test set

Evaluating the confusion matrix

Evaluating the classification report

In [24]:
# Predicting the test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)
In [25]:
# confusion matrix
cmap1 = sns.diverging_palette(260,-10,s=50, l=75, n=5, as_cmap=True)
plt.subplots(figsize=(12,8))
cf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(cf_matrix/np.sum(cf_matrix), cmap = cmap1, annot = True, annot_kws = {'size':15})
Out[25]:
<AxesSubplot:>
In [26]:
model.save('rain.h5')

Deep CC

In [27]:
!deepCC rain.h5
[INFO]
Reading [keras model] 'rain.h5'
[SUCCESS]
Saved 'rain_deepC/rain.onnx'
[INFO]
Reading [onnx model] 'rain_deepC/rain.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) dense_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_4's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_4) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'rain_deepC/rain.cpp'
[INFO]
deepSea model files are ready in 'rain_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "rain_deepC/rain.cpp" -D_AITS_MAIN -o "rain_deepC/rain.exe"
[RUNNING COMMAND]
size "rain_deepC/rain.exe"
   text	   data	    bss	    dec	    hex	filename
 131627	   2968	    760	 135355	  210bb	rain_deepC/rain.exe
[SUCCESS]
Saved model as executable "rain_deepC/rain.exe"
In [28]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 32)                864       
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_2 (Dense)              (None, 16)                528       
_________________________________________________________________
dropout (Dropout)            (None, 16)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 8)                 136       
_________________________________________________________________
dropout_1 (Dropout)          (None, 8)                 0         
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 9         
=================================================================
Total params: 2,593
Trainable params: 2,593
Non-trainable params: 0
_________________________________________________________________
In [ ]:
 
In [ ]:
 
In [ ]: