Cainvas
Model Files
fraud_detection_model.h5
keras
Model
deepSea Compiled Models
fraud_detection_model.exe
deepSea
Ubuntu

Credit Card Fraud Detection

Credit: AITS Cainvas Community

Photo by XPLAI on Dribbble

  • It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

  • The fraud usually occurs when someone accesses your credit or debit card numbers from unsecured websites or via an identity theft scheme to fraudulently obtain money or property. Due to its recurrence and the harm it may cause to both individuals and financial institutions, it is crucial to take preventive measures as well as identifying when a transaction is fraudulent.

Setup: Importing neccessary libraries

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam

import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler    # in order to scale data
from sklearn.metrics import classification_report,accuracy_score


import warnings as wr
wr.filterwarnings("ignore")

Reading the Dataset

  • 492 frauds out of 284,807 transactions
  • features V1 - V28 are a result of the PCA transformation and are simply numerical representations
  • "Amount" is the value in dollars of the transaction
  • "Time" variable is the amount of time that passed from the time when the first transaction took place.
  • Fraud = 1 , Not Fraud = 0

Going through the Data

In [2]:
data = pd.read_csv("https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/creditcard.csv")
data.head(5)
Out[2]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 0
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 0
2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0
3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 0
4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 0

5 rows × 31 columns

In [3]:
data.shape
Out[3]:
(284807, 31)
In [4]:
data.isnull().sum()
Out[4]:
Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64
In [5]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     284807 non-null  float64
 22  V22     284807 non-null  float64
 23  V23     284807 non-null  float64
 24  V24     284807 non-null  float64
 25  V25     284807 non-null  float64
 26  V26     284807 non-null  float64
 27  V27     284807 non-null  float64
 28  V28     284807 non-null  float64
 29  Amount  284807 non-null  float64
 30  Class   284807 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 67.4 MB
In [6]:
data.Class.values
Out[6]:
array([0, 0, 0, ..., 0, 0, 0])
In [7]:
data.Class.value_counts()
Out[7]:
0    284315
1       492
Name: Class, dtype: int64

Data Visualization

In [8]:
plt.figure(figsize = (6,5))
sns.countplot(data.Class, color = "orange")
plt.show()
In [9]:
data.hist(figsize=(30,30))
plt.show()

Data Pre-processing

In [10]:
fraud = data[data.Class == 1]
In [11]:
fraud             # Each row with class = 1
Out[11]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
541 406.0 -2.312227 1.951992 -1.609851 3.997906 -0.522188 -1.426545 -2.537387 1.391657 -2.770089 ... 0.517232 -0.035049 -0.465211 0.320198 0.044519 0.177840 0.261145 -0.143276 0.00 1
623 472.0 -3.043541 -3.157307 1.088463 2.288644 1.359805 -1.064823 0.325574 -0.067794 -0.270953 ... 0.661696 0.435477 1.375966 -0.293803 0.279798 -0.145362 -0.252773 0.035764 529.00 1
4920 4462.0 -2.303350 1.759247 -0.359745 2.330243 -0.821628 -0.075788 0.562320 -0.399147 -0.238253 ... -0.294166 -0.932391 0.172726 -0.087330 -0.156114 -0.542628 0.039566 -0.153029 239.93 1
6108 6986.0 -4.397974 1.358367 -2.592844 2.679787 -1.128131 -1.706536 -3.496197 -0.248778 -0.247768 ... 0.573574 0.176968 -0.436207 -0.053502 0.252405 -0.657488 -0.827136 0.849573 59.00 1
6329 7519.0 1.234235 3.019740 -4.304597 4.732795 3.624201 -1.357746 1.713445 -0.496358 -1.282858 ... -0.379068 -0.704181 -0.656805 -1.632653 1.488901 0.566797 -0.010016 0.146793 1.00 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
279863 169142.0 -1.927883 1.125653 -4.518331 1.749293 -1.566487 -2.010494 -0.882850 0.697211 -2.064945 ... 0.778584 -0.319189 0.639419 -0.294885 0.537503 0.788395 0.292680 0.147968 390.00 1
280143 169347.0 1.378559 1.289381 -5.004247 1.411850 0.442581 -1.326536 -1.413170 0.248525 -1.127396 ... 0.370612 0.028234 -0.145640 -0.081049 0.521875 0.739467 0.389152 0.186637 0.76 1
280149 169351.0 -0.676143 1.126366 -2.213700 0.468308 -1.120541 -0.003346 -2.234739 1.210158 -0.652250 ... 0.751826 0.834108 0.190944 0.032070 -0.739695 0.471111 0.385107 0.194361 77.89 1
281144 169966.0 -3.113832 0.585864 -5.399730 1.817092 -0.840618 -2.943548 -2.208002 1.058733 -1.632333 ... 0.583276 -0.269209 -0.456108 -0.183659 -0.328168 0.606116 0.884876 -0.253700 245.00 1
281674 170348.0 1.991976 0.158476 -2.583441 0.408670 1.151147 -0.096695 0.223050 -0.068384 0.577829 ... -0.164350 -0.295135 -0.072173 -0.450261 0.313267 -0.289617 0.002988 -0.015309 42.53 1

492 rows × 31 columns

In [12]:
non_fraud = data[data.Class == 0]
In [13]:
non_fraud           # Each row with class = 0
Out[13]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 0
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 0
2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0
3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 0
4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
284802 172786.0 -11.881118 10.071785 -9.834783 -2.066656 -5.364473 -2.606837 -4.918215 7.305334 1.914428 ... 0.213454 0.111864 1.014480 -0.509348 1.436807 0.250034 0.943651 0.823731 0.77 0
284803 172787.0 -0.732789 -0.055080 2.035030 -0.738589 0.868229 1.058415 0.024330 0.294869 0.584800 ... 0.214205 0.924384 0.012463 -1.016226 -0.606624 -0.395255 0.068472 -0.053527 24.79 0
284804 172788.0 1.919565 -0.301254 -3.249640 -0.557828 2.630515 3.031260 -0.296827 0.708417 0.432454 ... 0.232045 0.578229 -0.037501 0.640134 0.265745 -0.087371 0.004455 -0.026561 67.88 0
284805 172788.0 -0.240440 0.530483 0.702510 0.689799 -0.377961 0.623708 -0.686180 0.679145 0.392087 ... 0.265245 0.800049 -0.163298 0.123205 -0.569159 0.546668 0.108821 0.104533 10.00 0
284806 172792.0 -0.533413 -0.189733 0.703337 -0.506271 -0.012546 -0.649617 1.577006 -0.414650 0.486180 ... 0.261057 0.643078 0.376777 0.008797 -0.473649 -0.818267 -0.002415 0.013649 217.00 0

284315 rows × 31 columns

In [14]:
print("Shape of fraud data:", fraud.shape)
print("Shape of non-fraus data:", non_fraud.shape)
Shape of fraud data: (492, 31)
Shape of non-fraus data: (284315, 31)

Balancing the Dataset

In [15]:
nan_fraud_balanced = non_fraud.sample(4000)
In [16]:
nan_fraud_balanced
Out[16]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
241010 150843.0 -0.629280 -0.152634 1.460563 -2.982212 -0.589086 -0.940765 0.144343 -0.157058 -2.242747 ... -0.210460 -0.117587 -0.384878 0.030243 0.785383 -0.107176 0.259530 0.106702 15.00 0
239551 150158.0 1.848873 -0.599978 -0.402188 0.347994 -0.552613 0.125880 -0.680864 0.152062 1.157818 ... 0.273089 0.872919 0.103006 0.796296 -0.123302 -0.274757 0.025474 -0.030387 57.00 0
12057 20887.0 0.898614 0.033896 -0.083297 1.256326 0.554091 0.702342 0.155668 0.196360 1.160759 ... 0.031511 0.278743 -0.027337 -0.708803 0.393032 -0.231545 0.011633 0.005579 89.99 0
204669 135388.0 2.170257 -0.432743 -2.005738 -0.239275 -0.010736 -1.463763 0.410082 -0.594424 -0.718220 ... -0.219190 0.030104 -0.041832 -0.093620 0.469135 -0.063660 -0.030311 -0.065545 32.45 0
16841 28223.0 0.009634 -0.061660 1.251075 -1.835248 -0.356961 -0.357958 -0.179298 -0.066580 -0.882863 ... 0.543247 1.548464 -0.070185 -0.386779 -0.895966 -0.359398 0.256092 0.236962 22.49 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
36201 38434.0 1.147906 0.595501 0.105851 2.392329 0.185198 -0.529801 0.448219 -0.106654 -1.222637 ... 0.041377 -0.011423 -0.127251 0.312113 0.671663 0.051426 -0.051282 0.001927 22.75 0
271959 164838.0 -5.509718 -1.782470 -3.021547 0.413817 -0.677521 -1.198731 0.072238 1.413111 0.290375 ... -0.510416 -0.686484 0.190361 -0.369401 0.543961 -0.035204 0.475529 -1.135751 104.42 0
165781 117659.0 2.007363 -0.004923 -1.819552 1.141820 0.672523 -0.477654 0.465920 -0.257544 0.431512 ... -0.135028 -0.348148 -0.013655 -1.111657 0.289040 -0.629029 -0.016438 -0.057582 49.65 0
257266 158079.0 -0.000833 0.888962 0.372250 -0.689232 0.494921 -0.830089 0.975008 -0.117786 -0.446621 ... -0.192908 -0.367926 0.056573 0.095164 -0.501745 0.112045 0.249591 0.089999 2.97 0
253853 156459.0 -5.937674 3.810544 -1.724636 -1.739784 -1.959184 0.923363 -3.079122 -3.786873 1.555129 ... 5.963907 -0.875805 0.609910 -0.592334 0.117782 -0.498742 -1.778925 -0.282935 19.84 0

4000 rows × 31 columns

In [17]:
balanced_data = fraud.append(nan_fraud_balanced, ignore_index = True)
In [18]:
balanced_data     # 492 of them Class = 1 (fraud), 492 of them Class = 0 (nan_fraud)
Out[18]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 406.0 -2.312227 1.951992 -1.609851 3.997906 -0.522188 -1.426545 -2.537387 1.391657 -2.770089 ... 0.517232 -0.035049 -0.465211 0.320198 0.044519 0.177840 0.261145 -0.143276 0.00 1
1 472.0 -3.043541 -3.157307 1.088463 2.288644 1.359805 -1.064823 0.325574 -0.067794 -0.270953 ... 0.661696 0.435477 1.375966 -0.293803 0.279798 -0.145362 -0.252773 0.035764 529.00 1
2 4462.0 -2.303350 1.759247 -0.359745 2.330243 -0.821628 -0.075788 0.562320 -0.399147 -0.238253 ... -0.294166 -0.932391 0.172726 -0.087330 -0.156114 -0.542628 0.039566 -0.153029 239.93 1
3 6986.0 -4.397974 1.358367 -2.592844 2.679787 -1.128131 -1.706536 -3.496197 -0.248778 -0.247768 ... 0.573574 0.176968 -0.436207 -0.053502 0.252405 -0.657488 -0.827136 0.849573 59.00 1
4 7519.0 1.234235 3.019740 -4.304597 4.732795 3.624201 -1.357746 1.713445 -0.496358 -1.282858 ... -0.379068 -0.704181 -0.656805 -1.632653 1.488901 0.566797 -0.010016 0.146793 1.00 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4487 38434.0 1.147906 0.595501 0.105851 2.392329 0.185198 -0.529801 0.448219 -0.106654 -1.222637 ... 0.041377 -0.011423 -0.127251 0.312113 0.671663 0.051426 -0.051282 0.001927 22.75 0
4488 164838.0 -5.509718 -1.782470 -3.021547 0.413817 -0.677521 -1.198731 0.072238 1.413111 0.290375 ... -0.510416 -0.686484 0.190361 -0.369401 0.543961 -0.035204 0.475529 -1.135751 104.42 0
4489 117659.0 2.007363 -0.004923 -1.819552 1.141820 0.672523 -0.477654 0.465920 -0.257544 0.431512 ... -0.135028 -0.348148 -0.013655 -1.111657 0.289040 -0.629029 -0.016438 -0.057582 49.65 0
4490 158079.0 -0.000833 0.888962 0.372250 -0.689232 0.494921 -0.830089 0.975008 -0.117786 -0.446621 ... -0.192908 -0.367926 0.056573 0.095164 -0.501745 0.112045 0.249591 0.089999 2.97 0
4491 156459.0 -5.937674 3.810544 -1.724636 -1.739784 -1.959184 0.923363 -3.079122 -3.786873 1.555129 ... 5.963907 -0.875805 0.609910 -0.592334 0.117782 -0.498742 -1.778925 -0.282935 19.84 0

4492 rows × 31 columns

In [19]:
balanced_data.Class.value_counts()
Out[19]:
0    4000
1     492
Name: Class, dtype: int64
In [20]:
x = balanced_data.drop("Class", axis = 1)
x                                           # dataset without Class column
Out[20]:
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V20 V21 V22 V23 V24 V25 V26 V27 V28 Amount
0 406.0 -2.312227 1.951992 -1.609851 3.997906 -0.522188 -1.426545 -2.537387 1.391657 -2.770089 ... 0.126911 0.517232 -0.035049 -0.465211 0.320198 0.044519 0.177840 0.261145 -0.143276 0.00
1 472.0 -3.043541 -3.157307 1.088463 2.288644 1.359805 -1.064823 0.325574 -0.067794 -0.270953 ... 2.102339 0.661696 0.435477 1.375966 -0.293803 0.279798 -0.145362 -0.252773 0.035764 529.00
2 4462.0 -2.303350 1.759247 -0.359745 2.330243 -0.821628 -0.075788 0.562320 -0.399147 -0.238253 ... -0.430022 -0.294166 -0.932391 0.172726 -0.087330 -0.156114 -0.542628 0.039566 -0.153029 239.93
3 6986.0 -4.397974 1.358367 -2.592844 2.679787 -1.128131 -1.706536 -3.496197 -0.248778 -0.247768 ... -0.171608 0.573574 0.176968 -0.436207 -0.053502 0.252405 -0.657488 -0.827136 0.849573 59.00
4 7519.0 1.234235 3.019740 -4.304597 4.732795 3.624201 -1.357746 1.713445 -0.496358 -1.282858 ... 0.009061 -0.379068 -0.704181 -0.656805 -1.632653 1.488901 0.566797 -0.010016 0.146793 1.00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4487 38434.0 1.147906 0.595501 0.105851 2.392329 0.185198 -0.529801 0.448219 -0.106654 -1.222637 ... -0.154733 0.041377 -0.011423 -0.127251 0.312113 0.671663 0.051426 -0.051282 0.001927 22.75
4488 164838.0 -5.509718 -1.782470 -3.021547 0.413817 -0.677521 -1.198731 0.072238 1.413111 0.290375 ... -1.116204 -0.510416 -0.686484 0.190361 -0.369401 0.543961 -0.035204 0.475529 -1.135751 104.42
4489 117659.0 2.007363 -0.004923 -1.819552 1.141820 0.672523 -0.477654 0.465920 -0.257544 0.431512 ... -0.219644 -0.135028 -0.348148 -0.013655 -1.111657 0.289040 -0.629029 -0.016438 -0.057582 49.65
4490 158079.0 -0.000833 0.888962 0.372250 -0.689232 0.494921 -0.830089 0.975008 -0.117786 -0.446621 ... 0.025105 -0.192908 -0.367926 0.056573 0.095164 -0.501745 0.112045 0.249591 0.089999 2.97
4491 156459.0 -5.937674 3.810544 -1.724636 -1.739784 -1.959184 0.923363 -3.079122 -3.786873 1.555129 ... -1.941602 5.963907 -0.875805 0.609910 -0.592334 0.117782 -0.498742 -1.778925 -0.282935 19.84

4492 rows × 30 columns

In [21]:
y = balanced_data.Class
y
Out[21]:
0       1
1       1
2       1
3       1
4       1
       ..
4487    0
4488    0
4489    0
4490    0
4491    0
Name: Class, Length: 4492, dtype: int64
In [22]:
plt.figure(figsize = (6,5))
sns.countplot(y, palette="Set2")
plt.show()

Training and Testing Part

In [23]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2, random_state = 42)
In [24]:
xtrain.shape
Out[24]:
(3593, 30)
In [25]:
xtest.shape
Out[25]:
(899, 30)

Standardation

In [26]:
scaler = StandardScaler()
In [27]:
scaled_xtrain = scaler.fit_transform(xtrain)
scaled_xtest = scaler.fit_transform(xtest)
In [28]:
scaled_xtrain
Out[28]:
array([[-1.17332769, -1.37235541,  2.23151838, ...,  0.31154951,
         1.13658606,  0.07575495],
       [-1.75645899, -0.35069163,  1.21686553, ...,  0.40167155,
         0.3810731 , -0.37241978],
       [-1.00079048,  0.50775508, -0.22997529, ..., -0.00484764,
         0.07919079, -0.04853561],
       ...,
       [-0.48049822,  0.04732971, -0.35445207, ...,  0.12142507,
         0.2263101 , -0.27525453],
       [ 1.50055229,  0.13911858, -0.10283998, ...,  0.16008356,
         0.09610672, -0.33602329],
       [ 1.00571565, -0.36062296,  0.45764983, ..., -1.35342837,
        -0.87955646, -0.37335094]])
In [29]:
type(scaled_xtrain)
Out[29]:
numpy.ndarray
In [30]:
scaled_xtest
Out[30]:
array([[-1.01865299, -0.10765405,  0.37962278, ...,  0.02900122,
         0.55225096, -0.39597383],
       [ 0.79958346,  0.17355465,  0.18048943, ..., -0.06427317,
        -0.26815135, -0.38207544],
       [ 1.18634331, -0.74736784,  0.83935369, ..., -2.78350734,
        -0.90032522, -0.42222095],
       ...,
       [-1.18694025,  0.54158102, -0.8519865 , ..., -0.01174423,
         0.11124511,  0.08402896],
       [ 0.68013727, -0.1216719 , -0.22339051, ...,  0.08867429,
         0.55505001, -0.01621375],
       [-1.32450181,  0.43391604, -1.08571192, ...,  0.02302378,
         0.21130362,  0.77797973]])
In [31]:
type(scaled_xtest)
Out[31]:
numpy.ndarray
In [32]:
print(scaled_xtrain.shape)
print(scaled_xtest.shape)
(3593, 30)
(899, 30)
In [33]:
print(ytrain.shape)
print(ytest.shape)
(3593,)
(899,)
In [34]:
190820+93987 # Total dataset rows
Out[34]:
284807

3D Format

In [35]:
scaled_xtrain3d = scaled_xtrain.reshape(scaled_xtrain.shape[0],scaled_xtrain.shape[1],1)
scaled_xtest3d = scaled_xtest.reshape(scaled_xtest.shape[0],scaled_xtest.shape[1],1)

scaled_xtrain3d.shape, scaled_xtest3d.shape
Out[35]:
((3593, 30, 1), (899, 30, 1))

Network Building

In [36]:
# First Layer:

cnn = Sequential()
cnn.add(Conv1D(32, 2, activation = "relu", input_shape = (30,1)))
cnn.add(Dropout(0.1))
In [37]:
# Second Layer:

cnn.add(BatchNormalization()) # Batch normalization is a technique for training very deep neural networks 
                               # that standardizes the inputs to a layer for each mini-batch. This 
                               # has the effect of stabilizing the learning process and dramatically
                               # reducing the number of training epochs required to train deep networks

cnn.add(Conv1D(64, 2, activation = "relu"))
cnn.add(Dropout(0.2))          # prevents over-fitting (randomly remove some neurons)
In [38]:
# Flattening Layer:

cnn.add(Flatten())
cnn.add(Dropout(0.4))
cnn.add(Dense(64, activation = "relu"))
cnn.add(Dropout(0.5))
In [39]:
# Last Layer:

cnn.add(Dense(1, activation = "sigmoid"))
In [40]:
cnn.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 29, 32)            96        
_________________________________________________________________
dropout (Dropout)            (None, 29, 32)            0         
_________________________________________________________________
batch_normalization (BatchNo (None, 29, 32)            128       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 28, 64)            4160      
_________________________________________________________________
dropout_1 (Dropout)          (None, 28, 64)            0         
_________________________________________________________________
flatten (Flatten)            (None, 1792)              0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 1792)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                114752    
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 119,201
Trainable params: 119,137
Non-trainable params: 64
_________________________________________________________________
In [41]:
cnn.compile(optimizer = Adam(lr=0.0001), loss = "binary_crossentropy", metrics = ["accuracy"])

Training

In [42]:
history = cnn.fit(scaled_xtrain3d, ytrain, epochs = 20, validation_data=(scaled_xtest3d, ytest), verbose=1)
Epoch 1/20
113/113 [==============================] - 1s 5ms/step - loss: 0.2921 - accuracy: 0.9031 - val_loss: 0.3891 - val_accuracy: 0.9766
Epoch 2/20
113/113 [==============================] - 0s 3ms/step - loss: 0.1357 - accuracy: 0.9674 - val_loss: 0.1858 - val_accuracy: 0.9822
Epoch 3/20
113/113 [==============================] - 0s 3ms/step - loss: 0.1142 - accuracy: 0.9750 - val_loss: 0.0940 - val_accuracy: 0.9822
Epoch 4/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0972 - accuracy: 0.9752 - val_loss: 0.0707 - val_accuracy: 0.9833
Epoch 5/20
113/113 [==============================] - 0s 4ms/step - loss: 0.1003 - accuracy: 0.9750 - val_loss: 0.0621 - val_accuracy: 0.9833
Epoch 6/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0954 - accuracy: 0.9758 - val_loss: 0.0596 - val_accuracy: 0.9844
Epoch 7/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0884 - accuracy: 0.9775 - val_loss: 0.0594 - val_accuracy: 0.9833
Epoch 8/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0816 - accuracy: 0.9775 - val_loss: 0.0585 - val_accuracy: 0.9867
Epoch 9/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0889 - accuracy: 0.9772 - val_loss: 0.0573 - val_accuracy: 0.9867
Epoch 10/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0806 - accuracy: 0.9794 - val_loss: 0.0561 - val_accuracy: 0.9844
Epoch 11/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0862 - accuracy: 0.9780 - val_loss: 0.0555 - val_accuracy: 0.9867
Epoch 12/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0842 - accuracy: 0.9777 - val_loss: 0.0557 - val_accuracy: 0.9867
Epoch 13/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0889 - accuracy: 0.9777 - val_loss: 0.0557 - val_accuracy: 0.9867
Epoch 14/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0808 - accuracy: 0.9788 - val_loss: 0.0565 - val_accuracy: 0.9878
Epoch 15/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0774 - accuracy: 0.9775 - val_loss: 0.0556 - val_accuracy: 0.9878
Epoch 16/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0789 - accuracy: 0.9786 - val_loss: 0.0568 - val_accuracy: 0.9878
Epoch 17/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0781 - accuracy: 0.9786 - val_loss: 0.0556 - val_accuracy: 0.9867
Epoch 18/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0742 - accuracy: 0.9800 - val_loss: 0.0553 - val_accuracy: 0.9855
Epoch 19/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0750 - accuracy: 0.9786 - val_loss: 0.0550 - val_accuracy: 0.9867
Epoch 20/20
113/113 [==============================] - 0s 3ms/step - loss: 0.0684 - accuracy: 0.9802 - val_loss: 0.0545 - val_accuracy: 0.9867
In [43]:
fig, ax1 = plt.subplots(figsize= (10, 5))
plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc = "upper left")
plt.show()
In [44]:
fig, ax1 = plt.subplots(figsize= (10, 5))
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc = "upper left")
plt.show()

Evaluation

In [45]:
from sklearn.metrics import confusion_matrix
cnn_predictions = cnn.predict_classes(scaled_xtest3d)
confusion_matrix = confusion_matrix(ytest, cnn_predictions)
sns.heatmap(confusion_matrix, annot=True, fmt="d", cbar = False)
plt.title("CNN Confusion Matrix")
plt.show()
WARNING:tensorflow:From <ipython-input-45-ebd5b2c41ba7>:2: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
In [46]:
accuracy_score(ytest, cnn_predictions)
Out[46]:
0.9866518353726362
In [47]:
from sklearn.metrics import precision_recall_fscore_support as score
In [48]:
precision, recall, fscore, support = score(ytest, cnn_predictions)
print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(fscore))
print('support: {}'.format(support))
precision: [0.98875    0.96969697]
recall: [0.99622166 0.91428571]
fscore: [0.99247177 0.94117647]
support: [794 105]
In [49]:
cnn.save('fraud_detection_model.h5')

deepCC

In [51]:
!deepCC fraud_detection_model.h5
[INFO]
Reading [keras model] 'fraud_detection_model.h5'
[SUCCESS]
Saved 'fraud_detection_model_deepC/fraud_detection_model.onnx'
[INFO]
Reading [onnx model] 'fraud_detection_model_deepC/fraud_detection_model.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) conv1d_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'fraud_detection_model_deepC/fraud_detection_model.cpp'
[INFO]
deepSea model files are ready in 'fraud_detection_model_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "fraud_detection_model_deepC/fraud_detection_model.cpp" -D_AITS_MAIN -o "fraud_detection_model_deepC/fraud_detection_model.exe"
[RUNNING COMMAND]
size "fraud_detection_model_deepC/fraud_detection_model.exe"
   text	   data	    bss	    dec	    hex	filename
 632603	   3784	    760	 637147	  9b8db	fraud_detection_model_deepC/fraud_detection_model.exe
[SUCCESS]
Saved model as executable "fraud_detection_model_deepC/fraud_detection_model.exe"
In [ ]: