Cainvas
Model Files
online_shopper.h5
keras
Model
deepSea Compiled Models
online_shopper.exe
deepSea
Ubuntu

Online Shopper's Intention Prediction

Credit: AITS Cainvas Community

Photo by Karol Cichoń on Dribbble

Predict a customer's behaviour in online shopping websites for KPI and marketing analysis.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from tensorflow.keras import models, optimizers, losses, layers, callbacks
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import random
import warnings
warnings.filterwarnings("ignore")

The dataset

  1. C. Okan Sakar Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bahcesehir University, 34349 Besiktas, Istanbul, Turkey
  2. Yomi Kastro Inveon Information Technologies Consultancy and Trade, 34335 Istanbul, Turkey

Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Neural Comput & Applic (2018).

The dataset is a CSV file with 18 attributes (10 numerical and 8 categorical) and 1 target columns.

Administrative, Administrative Duration, Informational, Informational Duration, Product Related and Product Related Duration represent the number of times visited and duration of time spent in the respective categories of websites.

The Bounce Rate (the percentage of visitors who enter and leave the site without triggering any request), Exit Rate (percentage of sessions that ended int he page relative to all page views) and Page Value (the average value for a web page that a user visited before completing an e-commerce transaction) features represent the metrics measured by "Google Analytics" for each page in the e-commerce site.

The Special Day feature indicates the closeness of the site visiting time to a specific special day.

Other attributes such as operating system, browser, region, traffic type, visitor type, weekend, and month are also available.

In [2]:
df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/online_shoppers_intention.csv')
df
Out[2]:
Administrative Administrative_Duration Informational Informational_Duration ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues SpecialDay Month OperatingSystems Browser Region TrafficType VisitorType Weekend Revenue
0 0.0 0.0 0.0 0.0 1.0 0.000000 0.200000 0.200000 0.000000 0.0 Feb 1 1 1 1 Returning_Visitor False False
1 0.0 0.0 0.0 0.0 2.0 64.000000 0.000000 0.100000 0.000000 0.0 Feb 2 2 1 2 Returning_Visitor False False
2 0.0 -1.0 0.0 -1.0 1.0 -1.000000 0.200000 0.200000 0.000000 0.0 Feb 4 1 9 3 Returning_Visitor False False
3 0.0 0.0 0.0 0.0 2.0 2.666667 0.050000 0.140000 0.000000 0.0 Feb 3 2 2 4 Returning_Visitor False False
4 0.0 0.0 0.0 0.0 10.0 627.500000 0.020000 0.050000 0.000000 0.0 Feb 3 3 1 4 Returning_Visitor True False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12325 3.0 145.0 0.0 0.0 53.0 1783.791667 0.007143 0.029031 12.241717 0.0 Dec 4 6 1 1 Returning_Visitor True False
12326 0.0 0.0 0.0 0.0 5.0 465.750000 0.000000 0.021333 0.000000 0.0 Nov 3 2 1 8 Returning_Visitor True False
12327 0.0 0.0 0.0 0.0 6.0 184.250000 0.083333 0.086667 0.000000 0.0 Nov 3 2 1 13 Returning_Visitor True False
12328 4.0 75.0 0.0 0.0 15.0 346.000000 0.000000 0.021053 0.000000 0.0 Nov 2 2 3 11 Returning_Visitor False False
12329 0.0 0.0 0.0 0.0 3.0 21.250000 0.000000 0.066667 0.000000 0.0 Nov 3 2 1 2 New_Visitor True False

12330 rows × 18 columns

Looking into the columns in the data frame

In [3]:
df.columns
Out[3]:
Index(['Administrative', 'Administrative_Duration', 'Informational',
       'Informational_Duration', 'ProductRelated', 'ProductRelated_Duration',
       'BounceRates', 'ExitRates', 'PageValues', 'SpecialDay', 'Month',
       'OperatingSystems', 'Browser', 'Region', 'TrafficType', 'VisitorType',
       'Weekend', 'Revenue'],
      dtype='object')

Defining the numeric columns for standardization later

In [4]:
numeric_columns = ['Administrative', 'Administrative_Duration', 'Informational',
       'Informational_Duration', 'ProductRelated', 'ProductRelated_Duration',
       'BounceRates', 'ExitRates', 'PageValues', 'SpecialDay']

Checking for NaN values

...and dropping them.

In [5]:
print(df.isna().sum())

df = df.dropna()
Administrative             14
Administrative_Duration    14
Informational              14
Informational_Duration     14
ProductRelated             14
ProductRelated_Duration    14
BounceRates                14
ExitRates                  14
PageValues                  0
SpecialDay                  0
Month                       0
OperatingSystems            0
Browser                     0
Region                      0
TrafficType                 0
VisitorType                 0
Weekend                     0
Revenue                     0
dtype: int64

A peek into the class label distribution

In [6]:
df['Revenue'].value_counts()
Out[6]:
False    10408
True      1908
Name: Revenue, dtype: int64

Its not balanced but let us see how our model performs on this data.

A peek in to the values in the 'Month' column

In [7]:
df['Month'].value_counts()
Out[7]:
May     3363
Nov     2998
Mar     1894
Dec     1727
Oct      549
Sep      448
Aug      433
Jul      432
June     288
Feb      184
Name: Month, dtype: int64

Only 10 out of 12 months are in the dataframe. The month column needs to be one-hot encoded with all the 12 months in count.

In [8]:
# Convert binary to int
df['Weekend'] = df['Weekend'].astype('int64')
df['Revenue'] = df['Revenue'].astype('int64')

# One hot encoding 
dummy_columns = ['OperatingSystems','Browser','Region','TrafficType','VisitorType']

for column in dummy_columns:
    df_dummies = pd.get_dummies(df[column], drop_first = True, prefix = column+"_")    
    df = pd.concat([df, df_dummies], axis = 1)
    
df = df.drop(columns = dummy_columns)

# Accounting for all months in the calendar
months = ['Jan','Feb','Mar','Apr','May','June','Jul','Aug','Sep','Oct','Nov','Dec']

for mx in months[1:]:    # drop_first = True
    df[mx] = (df['Month'] == mx).astype('int64')

df = df.drop(columns = ['Month'])

Defining input and output columns

In [9]:
input_columns = df.columns.tolist()
input_columns.remove('Revenue')

output_columns = ['Revenue']

Train-val-test split based on 80-10-10 ratio

In [10]:
# Splitting into train, val and test set -- 80-10-10 split

# First, an 80-20 split
train_df, val_test_df = train_test_split(df, test_size = 0.2)

# Then split the 20% into half
val_df, test_df = train_test_split(val_test_df, test_size = 0.5)

print("Number of samples in...")
print("Training set: ", len(train_df))
print("Validation set: ", len(val_df))
print("Testing set: ", len(test_df))
Number of samples in...
Training set:  9852
Validation set:  1232
Testing set:  1232

Standardizing the numeric column values

In [11]:
ss = StandardScaler()

train_df[numeric_columns] = ss.fit_transform(train_df[numeric_columns])
val_df[numeric_columns] = ss.transform(val_df[numeric_columns])
test_df[numeric_columns] = ss.transform(test_df[numeric_columns])
In [12]:
# Splitting into X (input) and y (output)

Xtrain, ytrain = np.array(train_df[input_columns]), np.array(train_df[output_columns])

Xval, yval = np.array(val_df[input_columns]), np.array(val_df[output_columns])

Xtest, ytest = np.array(test_df[input_columns]).astype('float16'), np.array(test_df[output_columns])

The model

In [13]:
model = models.Sequential([
    layers.Dense(16, activation = 'relu', input_shape = Xtrain[0].shape),
    layers.Dense(8, activation = 'relu'),
    layers.Dense(1, activation = 'sigmoid')
])

cb = callbacks.EarlyStopping(patience = 5, restore_best_weights = True)
In [14]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 16)                1136      
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 9         
=================================================================
Total params: 1,281
Trainable params: 1,281
Non-trainable params: 0
_________________________________________________________________
In [15]:
model.compile(optimizer = optimizers.Adam(0.0001), loss = losses.BinaryCrossentropy(), metrics = ['accuracy'])

history = model.fit(Xtrain, ytrain, validation_data = (Xval, yval), epochs = 256, callbacks = cb)
Epoch 1/256
308/308 [==============================] - 1s 3ms/step - loss: 0.5524 - accuracy: 0.8165 - val_loss: 0.4806 - val_accuracy: 0.8450
Epoch 2/256
308/308 [==============================] - 1s 2ms/step - loss: 0.4279 - accuracy: 0.8470 - val_loss: 0.3911 - val_accuracy: 0.8506
Epoch 3/256
308/308 [==============================] - 1s 2ms/step - loss: 0.3674 - accuracy: 0.8536 - val_loss: 0.3582 - val_accuracy: 0.8523
Epoch 4/256
308/308 [==============================] - 1s 2ms/step - loss: 0.3419 - accuracy: 0.8584 - val_loss: 0.3425 - val_accuracy: 0.8571
Epoch 5/256
308/308 [==============================] - 1s 2ms/step - loss: 0.3260 - accuracy: 0.8653 - val_loss: 0.3311 - val_accuracy: 0.8644
Epoch 6/256
308/308 [==============================] - 1s 2ms/step - loss: 0.3131 - accuracy: 0.8717 - val_loss: 0.3212 - val_accuracy: 0.8644
Epoch 7/256
308/308 [==============================] - 1s 2ms/step - loss: 0.3024 - accuracy: 0.8744 - val_loss: 0.3126 - val_accuracy: 0.8677
Epoch 8/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2935 - accuracy: 0.8795 - val_loss: 0.3056 - val_accuracy: 0.8742
Epoch 9/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2861 - accuracy: 0.8825 - val_loss: 0.2997 - val_accuracy: 0.8782
Epoch 10/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2801 - accuracy: 0.8855 - val_loss: 0.2948 - val_accuracy: 0.8782
Epoch 11/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2752 - accuracy: 0.8877 - val_loss: 0.2906 - val_accuracy: 0.8815
Epoch 12/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2711 - accuracy: 0.8884 - val_loss: 0.2875 - val_accuracy: 0.8831
Epoch 13/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2676 - accuracy: 0.8908 - val_loss: 0.2851 - val_accuracy: 0.8839
Epoch 14/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2648 - accuracy: 0.8913 - val_loss: 0.2831 - val_accuracy: 0.8856
Epoch 15/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2624 - accuracy: 0.8919 - val_loss: 0.2813 - val_accuracy: 0.8847
Epoch 16/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2603 - accuracy: 0.8925 - val_loss: 0.2799 - val_accuracy: 0.8831
Epoch 17/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2585 - accuracy: 0.8920 - val_loss: 0.2785 - val_accuracy: 0.8847
Epoch 18/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2568 - accuracy: 0.8927 - val_loss: 0.2773 - val_accuracy: 0.8856
Epoch 19/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2554 - accuracy: 0.8931 - val_loss: 0.2765 - val_accuracy: 0.8856
Epoch 20/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2540 - accuracy: 0.8937 - val_loss: 0.2756 - val_accuracy: 0.8856
Epoch 21/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2528 - accuracy: 0.8934 - val_loss: 0.2746 - val_accuracy: 0.8864
Epoch 22/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2516 - accuracy: 0.8940 - val_loss: 0.2739 - val_accuracy: 0.8864
Epoch 23/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2506 - accuracy: 0.8946 - val_loss: 0.2734 - val_accuracy: 0.8872
Epoch 24/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2496 - accuracy: 0.8949 - val_loss: 0.2730 - val_accuracy: 0.8880
Epoch 25/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2487 - accuracy: 0.8944 - val_loss: 0.2723 - val_accuracy: 0.8880
Epoch 26/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2478 - accuracy: 0.8951 - val_loss: 0.2720 - val_accuracy: 0.8880
Epoch 27/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2470 - accuracy: 0.8957 - val_loss: 0.2719 - val_accuracy: 0.8888
Epoch 28/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2462 - accuracy: 0.8963 - val_loss: 0.2713 - val_accuracy: 0.8888
Epoch 29/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2454 - accuracy: 0.8960 - val_loss: 0.2709 - val_accuracy: 0.8888
Epoch 30/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2446 - accuracy: 0.8963 - val_loss: 0.2706 - val_accuracy: 0.8880
Epoch 31/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2439 - accuracy: 0.8964 - val_loss: 0.2702 - val_accuracy: 0.8872
Epoch 32/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2431 - accuracy: 0.8968 - val_loss: 0.2697 - val_accuracy: 0.8880
Epoch 33/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2425 - accuracy: 0.8965 - val_loss: 0.2695 - val_accuracy: 0.8872
Epoch 34/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2418 - accuracy: 0.8972 - val_loss: 0.2692 - val_accuracy: 0.8864
Epoch 35/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2411 - accuracy: 0.8974 - val_loss: 0.2686 - val_accuracy: 0.8856
Epoch 36/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2404 - accuracy: 0.8976 - val_loss: 0.2685 - val_accuracy: 0.8864
Epoch 37/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2398 - accuracy: 0.8974 - val_loss: 0.2683 - val_accuracy: 0.8864
Epoch 38/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2391 - accuracy: 0.8976 - val_loss: 0.2681 - val_accuracy: 0.8856
Epoch 39/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2385 - accuracy: 0.8974 - val_loss: 0.2675 - val_accuracy: 0.8880
Epoch 40/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2379 - accuracy: 0.8979 - val_loss: 0.2672 - val_accuracy: 0.8872
Epoch 41/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2372 - accuracy: 0.8974 - val_loss: 0.2667 - val_accuracy: 0.8880
Epoch 42/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2366 - accuracy: 0.8978 - val_loss: 0.2666 - val_accuracy: 0.8872
Epoch 43/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2361 - accuracy: 0.8983 - val_loss: 0.2660 - val_accuracy: 0.8880
Epoch 44/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2355 - accuracy: 0.8983 - val_loss: 0.2657 - val_accuracy: 0.8888
Epoch 45/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2349 - accuracy: 0.8990 - val_loss: 0.2658 - val_accuracy: 0.8864
Epoch 46/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2345 - accuracy: 0.8994 - val_loss: 0.2652 - val_accuracy: 0.8888
Epoch 47/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2340 - accuracy: 0.8990 - val_loss: 0.2650 - val_accuracy: 0.8880
Epoch 48/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2336 - accuracy: 0.9001 - val_loss: 0.2648 - val_accuracy: 0.8888
Epoch 49/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2330 - accuracy: 0.9001 - val_loss: 0.2645 - val_accuracy: 0.8888
Epoch 50/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2326 - accuracy: 0.9006 - val_loss: 0.2643 - val_accuracy: 0.8888
Epoch 51/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2322 - accuracy: 0.9009 - val_loss: 0.2642 - val_accuracy: 0.8888
Epoch 52/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2318 - accuracy: 0.9012 - val_loss: 0.2643 - val_accuracy: 0.8888
Epoch 53/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2314 - accuracy: 0.9013 - val_loss: 0.2637 - val_accuracy: 0.8888
Epoch 54/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2310 - accuracy: 0.9009 - val_loss: 0.2638 - val_accuracy: 0.8888
Epoch 55/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2306 - accuracy: 0.9008 - val_loss: 0.2632 - val_accuracy: 0.8896
Epoch 56/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2302 - accuracy: 0.9008 - val_loss: 0.2631 - val_accuracy: 0.8904
Epoch 57/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2298 - accuracy: 0.9019 - val_loss: 0.2633 - val_accuracy: 0.8896
Epoch 58/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2294 - accuracy: 0.9019 - val_loss: 0.2629 - val_accuracy: 0.8904
Epoch 59/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2290 - accuracy: 0.9022 - val_loss: 0.2627 - val_accuracy: 0.8904
Epoch 60/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2286 - accuracy: 0.9023 - val_loss: 0.2624 - val_accuracy: 0.8929
Epoch 61/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2283 - accuracy: 0.9029 - val_loss: 0.2623 - val_accuracy: 0.8929
Epoch 62/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2280 - accuracy: 0.9031 - val_loss: 0.2623 - val_accuracy: 0.8920
Epoch 63/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2276 - accuracy: 0.9030 - val_loss: 0.2621 - val_accuracy: 0.8920
Epoch 64/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2272 - accuracy: 0.9034 - val_loss: 0.2619 - val_accuracy: 0.8929
Epoch 65/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2269 - accuracy: 0.9033 - val_loss: 0.2615 - val_accuracy: 0.8945
Epoch 66/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2266 - accuracy: 0.9035 - val_loss: 0.2616 - val_accuracy: 0.8945
Epoch 67/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2263 - accuracy: 0.9046 - val_loss: 0.2613 - val_accuracy: 0.8953
Epoch 68/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2259 - accuracy: 0.9041 - val_loss: 0.2610 - val_accuracy: 0.8953
Epoch 69/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2256 - accuracy: 0.9039 - val_loss: 0.2609 - val_accuracy: 0.8953
Epoch 70/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2253 - accuracy: 0.9043 - val_loss: 0.2609 - val_accuracy: 0.8945
Epoch 71/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2250 - accuracy: 0.9042 - val_loss: 0.2605 - val_accuracy: 0.8945
Epoch 72/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2247 - accuracy: 0.9042 - val_loss: 0.2604 - val_accuracy: 0.8953
Epoch 73/256
308/308 [==============================] - 1s 5ms/step - loss: 0.2244 - accuracy: 0.9051 - val_loss: 0.2604 - val_accuracy: 0.8945
Epoch 74/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2241 - accuracy: 0.9053 - val_loss: 0.2603 - val_accuracy: 0.8945
Epoch 75/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2238 - accuracy: 0.9055 - val_loss: 0.2603 - val_accuracy: 0.8945
Epoch 76/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2235 - accuracy: 0.9052 - val_loss: 0.2601 - val_accuracy: 0.8945
Epoch 77/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2232 - accuracy: 0.9052 - val_loss: 0.2601 - val_accuracy: 0.8945
Epoch 78/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2230 - accuracy: 0.9052 - val_loss: 0.2600 - val_accuracy: 0.8945
Epoch 79/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2227 - accuracy: 0.9057 - val_loss: 0.2599 - val_accuracy: 0.8945
Epoch 80/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2224 - accuracy: 0.9060 - val_loss: 0.2602 - val_accuracy: 0.8945
Epoch 81/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2222 - accuracy: 0.9062 - val_loss: 0.2600 - val_accuracy: 0.8945
Epoch 82/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2219 - accuracy: 0.9064 - val_loss: 0.2601 - val_accuracy: 0.8945
Epoch 83/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2217 - accuracy: 0.9061 - val_loss: 0.2598 - val_accuracy: 0.8945
Epoch 84/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2215 - accuracy: 0.9066 - val_loss: 0.2599 - val_accuracy: 0.8953
Epoch 85/256
308/308 [==============================] - 1s 3ms/step - loss: 0.2212 - accuracy: 0.9062 - val_loss: 0.2600 - val_accuracy: 0.8953
Epoch 86/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2209 - accuracy: 0.9063 - val_loss: 0.2599 - val_accuracy: 0.8945
Epoch 87/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2207 - accuracy: 0.9065 - val_loss: 0.2599 - val_accuracy: 0.8929
Epoch 88/256
308/308 [==============================] - 1s 2ms/step - loss: 0.2205 - accuracy: 0.9064 - val_loss: 0.2598 - val_accuracy: 0.8920
In [16]:
model.evaluate(Xtest, ytest)
39/39 [==============================] - 0s 985us/step - loss: 0.2431 - accuracy: 0.8994
Out[16]:
[0.24310359358787537, 0.899350643157959]
In [17]:
cm = confusion_matrix(ytest, (model.predict(Xtest)>0.5).astype('int64'))
cm = cm.astype('int') / cm.sum(axis=1)[:, np.newaxis]

fig = plt.figure(figsize = (5, 5))
ax = fig.add_subplot(111)

for i in range(cm.shape[1]):
    for j in range(cm.shape[0]):
        if cm[i,j] > 0.8:
            clr = "white"
        else:
            clr = "black"
        ax.text(j, i, format(cm[i, j], '.2f'), horizontalalignment="center", color=clr)

_ = ax.imshow(cm, cmap=plt.cm.Blues)
ax.set_xticks(range(2))
ax.set_yticks(range(2))
ax.set_xticklabels(['True', 'False'], rotation = 90)
ax.set_yticklabels(['True', 'False'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

Plotting the metrics

In [18]:
def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.legend([variable, variable2])
    plt.title(variable)
In [19]:
plot(history.history, "loss", "val_loss")
In [20]:
plot(history.history, "accuracy", "val_accuracy")

Prediction

In [21]:
# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)

output = model.predict(Xtest[x].reshape(1, -1))[0][0]
pred = (output>0.5).astype('int64')

print("Predicted: ", bool(pred), "(", output, "-->", pred, ")")   

print("True: ", bool(ytest[x]))
Predicted:  False ( 0.006745213 --> 0 )
True:  False

deepC

In [22]:
model.save('online_shopper.h5')

!deepCC online_shopper.h5
[INFO]
Reading [keras model] 'online_shopper.h5'
[SUCCESS]
Saved 'online_shopper_deepC/online_shopper.onnx'
[INFO]
Reading [onnx model] 'online_shopper_deepC/online_shopper.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) dense_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_2's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_2) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'online_shopper_deepC/online_shopper.cpp'
[INFO]
deepSea model files are ready in 'online_shopper_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "online_shopper_deepC/online_shopper.cpp" -D_AITS_MAIN -o "online_shopper_deepC/online_shopper.exe"
[RUNNING COMMAND]
size "online_shopper_deepC/online_shopper.exe"
   text	   data	    bss	    dec	    hex	filename
 123619	   2968	    760	 127347	  1f173	online_shopper_deepC/online_shopper.exe
[SUCCESS]
Saved model as executable "online_shopper_deepC/online_shopper.exe"
In [23]:
x = random.randint(0, len(Xtest) - 1)

np.savetxt('sample.data', Xtest[x])    # xth sample into text file

# run exe with input
!online_shopper_deepC/online_shopper.exe sample.data

output = model.predict(Xtest[x].reshape(1, -1))[0][0]
predm = (output>0.5).astype('int64')

# show predicted output
nn_out = np.loadtxt('deepSea_result_1.out')

pred = (nn_out>0.5).astype('int64')
print("Predicted (deepC): ", bool(pred), "(", nn_out, "-->", pred, ")")   
print("Predicted (model): ", bool(predm), "(", output, "-->", predm, ")")   

print("True: ", bool(ytest[x]))
writing file deepSea_result_1.out.
Predicted (deepC):  False ( 0.00285643 --> 0 )
Predicted (model):  False ( 0.0028564227 --> 0 )
True:  False