Credit Card Fraud Detection¶
Credit: AITS Cainvas Community
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.
The fraud usually occurs when someone accesses your credit or debit card numbers from unsecured websites or via an identity theft scheme to fraudulently obtain money or property. Due to its recurrence and the harm it may cause to both individuals and financial institutions, it is crucial to take preventive measures as well as identifying when a transaction is fraudulent.
Setup: Importing neccessary libraries¶
In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler # in order to scale data
from sklearn.metrics import classification_report,accuracy_score
import warnings as wr
wr.filterwarnings("ignore")
Reading the Dataset¶
- 492 frauds out of 284,807 transactions
- features V1 - V28 are a result of the PCA transformation and are simply numerical representations
- "Amount" is the value in dollars of the transaction
- "Time" variable is the amount of time that passed from the time when the first transaction took place.
- Fraud = 1 , Not Fraud = 0
Going through the Data¶
In [2]:
data = pd.read_csv("https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/creditcard.csv")
data.head(5)
Out[2]:
In [3]:
data.shape
Out[3]:
In [4]:
data.isnull().sum()
Out[4]:
In [5]:
data.info()
In [6]:
data.Class.values
Out[6]:
In [7]:
data.Class.value_counts()
Out[7]:
Data Visualization¶
In [8]:
plt.figure(figsize = (6,5))
sns.countplot(data.Class, color = "orange")
plt.show()
In [9]:
data.hist(figsize=(30,30))
plt.show()
Data Pre-processing¶
In [10]:
fraud = data[data.Class == 1]
In [11]:
fraud # Each row with class = 1
Out[11]:
In [12]:
non_fraud = data[data.Class == 0]
In [13]:
non_fraud # Each row with class = 0
Out[13]:
In [14]:
print("Shape of fraud data:", fraud.shape)
print("Shape of non-fraus data:", non_fraud.shape)
Balancing the Dataset¶
In [15]:
nan_fraud_balanced = non_fraud.sample(4000)
In [16]:
nan_fraud_balanced
Out[16]:
In [17]:
balanced_data = fraud.append(nan_fraud_balanced, ignore_index = True)
In [18]:
balanced_data # 492 of them Class = 1 (fraud), 492 of them Class = 0 (nan_fraud)
Out[18]:
In [19]:
balanced_data.Class.value_counts()
Out[19]:
In [20]:
x = balanced_data.drop("Class", axis = 1)
x # dataset without Class column
Out[20]:
In [21]:
y = balanced_data.Class
y
Out[21]:
In [22]:
plt.figure(figsize = (6,5))
sns.countplot(y, palette="Set2")
plt.show()
Training and Testing Part¶
In [23]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2, random_state = 42)
In [24]:
xtrain.shape
Out[24]:
In [25]:
xtest.shape
Out[25]:
Standardation¶
In [26]:
scaler = StandardScaler()
In [27]:
scaled_xtrain = scaler.fit_transform(xtrain)
scaled_xtest = scaler.fit_transform(xtest)
In [28]:
scaled_xtrain
Out[28]:
In [29]:
type(scaled_xtrain)
Out[29]:
In [30]:
scaled_xtest
Out[30]:
In [31]:
type(scaled_xtest)
Out[31]:
In [32]:
print(scaled_xtrain.shape)
print(scaled_xtest.shape)
In [33]:
print(ytrain.shape)
print(ytest.shape)
In [34]:
190820+93987 # Total dataset rows
Out[34]:
3D Format¶
In [35]:
scaled_xtrain3d = scaled_xtrain.reshape(scaled_xtrain.shape[0],scaled_xtrain.shape[1],1)
scaled_xtest3d = scaled_xtest.reshape(scaled_xtest.shape[0],scaled_xtest.shape[1],1)
scaled_xtrain3d.shape, scaled_xtest3d.shape
Out[35]:
Network Building¶
In [36]:
# First Layer:
cnn = Sequential()
cnn.add(Conv1D(32, 2, activation = "relu", input_shape = (30,1)))
cnn.add(Dropout(0.1))
In [37]:
# Second Layer:
cnn.add(BatchNormalization()) # Batch normalization is a technique for training very deep neural networks
# that standardizes the inputs to a layer for each mini-batch. This
# has the effect of stabilizing the learning process and dramatically
# reducing the number of training epochs required to train deep networks
cnn.add(Conv1D(64, 2, activation = "relu"))
cnn.add(Dropout(0.2)) # prevents over-fitting (randomly remove some neurons)
In [38]:
# Flattening Layer:
cnn.add(Flatten())
cnn.add(Dropout(0.4))
cnn.add(Dense(64, activation = "relu"))
cnn.add(Dropout(0.5))
In [39]:
# Last Layer:
cnn.add(Dense(1, activation = "sigmoid"))
In [40]:
cnn.summary()
In [41]:
cnn.compile(optimizer = Adam(lr=0.0001), loss = "binary_crossentropy", metrics = ["accuracy"])
Training¶
In [42]:
history = cnn.fit(scaled_xtrain3d, ytrain, epochs = 20, validation_data=(scaled_xtest3d, ytest), verbose=1)
In [43]:
fig, ax1 = plt.subplots(figsize= (10, 5))
plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Model accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc = "upper left")
plt.show()
In [44]:
fig, ax1 = plt.subplots(figsize= (10, 5))
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("Model loss")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend(["Train", "Validation"], loc = "upper left")
plt.show()
Evaluation¶
In [45]:
from sklearn.metrics import confusion_matrix
cnn_predictions = cnn.predict_classes(scaled_xtest3d)
confusion_matrix = confusion_matrix(ytest, cnn_predictions)
sns.heatmap(confusion_matrix, annot=True, fmt="d", cbar = False)
plt.title("CNN Confusion Matrix")
plt.show()
In [46]:
accuracy_score(ytest, cnn_predictions)
Out[46]:
In [47]:
from sklearn.metrics import precision_recall_fscore_support as score
In [48]:
precision, recall, fscore, support = score(ytest, cnn_predictions)
print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(fscore))
print('support: {}'.format(support))
In [49]:
cnn.save('fraud_detection_model.h5')
deepCC¶
In [51]:
!deepCC fraud_detection_model.h5
In [ ]: