NOTE: This Use Case is not purposed for resource constrained devices.
DeepFake Face Detection¶
Credit: AITS Cainvas Community
Photo by Javier Jaén, Svetikd on The New Yorker
We have seen after the development of GANs, Deepfakes came into existence.Though the development of these techniques were primarily to increase the amount of training data but many people were found misusing these techniques for criminal activities. So, it is the need of the hour to develop one such model which can differentiate between real and deepfake faces.¶
Import Dataset and Necessary Libraries¶
In [1]:
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/realVSfake.zip"
!unzip -qo realVSfake.zip
!rm realVSfake.zip
In [2]:
import numpy as np
import pandas as pd
from keras.applications.mobilenet import preprocess_input
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Dense,BatchNormalization, Flatten, MaxPool2D
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, Callback
from keras.layers import Conv2D, Reshape
from keras.utils import Sequence
from keras.backend import epsilon
import tensorflow as tf
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from keras.layers import Convolution2D, Conv2D, MaxPooling2D, GlobalAveragePooling2D
import cv2
from tqdm.notebook import tqdm_notebook as tqdm
import os
In [3]:
print(os.listdir("realVSfake/real_and_fake_face"))
In [4]:
real = "realVSfake/real_and_fake_face/training_real/"
fake = "realVSfake/real_and_fake_face/training_fake/"
real_path = os.listdir(real)
fake_path = os.listdir(fake)
Visulaizing the real and fake faces¶
In [5]:
def load_img(path):
image = cv2.imread(path)
image = cv2.resize(image,(224, 224))
return image[...,::-1]
In [6]:
fig = plt.figure(figsize=(10, 10))
for i in range(16):
plt.subplot(4, 4, i+1)
plt.imshow(load_img(real + real_path[i]), cmap='gray')
plt.suptitle("Real faces",fontsize=20)
plt.axis('off')
plt.show()
In [7]:
fig = plt.figure(figsize=(10,10))
for i in range(16):
plt.subplot(4, 4, i+1)
plt.imshow(load_img(fake + fake_path[i]), cmap='gray')
plt.suptitle("Fakes faces",fontsize=20)
plt.title(fake_path[i][:4])
plt.axis('off')
plt.show()
In [8]:
dataset_path = "realVSfake/real_and_fake_face"
Data augumentation and Data Loader¶
In [9]:
data_with_aug = ImageDataGenerator(horizontal_flip=True,
vertical_flip=False,
rescale=1./255,
validation_split=0.2)
In [10]:
val = data_with_aug.flow_from_directory(dataset_path,
class_mode="binary",
target_size=(224, 224),
batch_size=32,
subset="validation"
)
In [11]:
train = data_with_aug.flow_from_directory(dataset_path,
class_mode="binary",
target_size=(224, 224),
batch_size=32)
Building VGG16 model from Scratch¶
In [12]:
# The original model does not contain Dropout Layers
vgg_model = Sequential()
vgg_model.add(Conv2D(filters=64, kernel_size=1, input_shape=(224, 224, 3), activation='relu'))
vgg_model.add(Conv2D(filters=64, kernel_size=1, activation='relu'))
vgg_model.add(MaxPooling2D(pool_size=2))
vgg_model.add(Dropout(0.2))
vgg_model.add(Conv2D(filters=128, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=128, kernel_size=1, activation='relu'))
vgg_model.add(MaxPooling2D(pool_size=2))
vgg_model.add(Dropout(0.2))
vgg_model.add(Conv2D(filters=256, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=256, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=256, kernel_size=1, activation='relu'))
vgg_model.add(MaxPooling2D(pool_size=2))
vgg_model.add(Dropout(0.2))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(MaxPooling2D(pool_size=2))
vgg_model.add(Dropout(0.2))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(Conv2D(filters=512, kernel_size=1, activation='relu'))
vgg_model.add(MaxPooling2D(pool_size=2))
vgg_model.add(Dropout(0.2))
vgg_model.add(Flatten())
vgg_model.add(Dense(256, activation='relu'))
vgg_model.add(Dense(128, activation='relu'))
vgg_model.add(Dense(2, activation='softmax'))
vgg_model.summary()
Model Training¶
In [13]:
vgg_model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics="accuracy")
history = vgg_model.fit(train,
epochs=10
)
In [14]:
#Creating an array of predicted test images
vgg_predictions = vgg_model.predict(val)
In [15]:
scores = vgg_model.evaluate(val, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
We trained the model for only 10 epochs as we cans see that the model performance was not increasing and it managed to achieve only 57% accuracy.This happened due to less training data and originally VGG16 was trained on a very large dataset and for a very long time.Model performance can be increased on a larger dataset.¶
Save our model¶
In [16]:
# We are saving our model so that we can compile it later with DeepC Compiler
vgg_model.save("real_vs_fake.h5")
Loading origianal VGG16 pretrained Model from Keras¶
- VGG16 was originally trained on imagenet dataset but we will be using transfer learning to use VGG16 on our dataset
In [17]:
vgg16_model = tf.keras.applications.vgg16.VGG16(include_top=False, weights="imagenet", input_shape=(224,224,3))
In [18]:
# Viewing the last convolutional layer output shape
vgg16_model.output[-1]
Out[18]:
In [19]:
model = Sequential([vgg16_model,
GlobalAveragePooling2D(),
Dense(512, activation = "relu"),
BatchNormalization(),
Dense(128, activation = "relu"),
Dense(2, activation = "softmax")])
# We will be training only our dense layers not the entire VGG16 model
model.layers[0].trainable = False
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics="accuracy")
model.summary()
Model Training¶
In [20]:
history = model.fit(train,
epochs=15
)
Predictions¶
In [21]:
#Creating an array of predicted test images
predictions = model.predict(val)
As we can see by using transfer learning we obtained around 98% accuracy.¶
In [22]:
scores = model.evaluate(val, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
Accessing Model Performance¶
In [23]:
val_path = "realVSfake/real_and_fake_face/"
plt.figure(figsize=(15,15))
start_index = 90
for i in range(16):
plt.subplot(4,4, i+1)
plt.grid(False)
plt.xticks([])
plt.yticks([])
preds = np.argmax(predictions[[start_index+i]])
gt = val.filenames[start_index+i][9:13]
if gt == "fake":
gt = 0
else:
gt = 1
if preds != gt:
col ="r"
else:
col = "g"
plt.xlabel('i={}, pred={}, gt={}'.format(start_index+i,preds,gt),color=col)
plt.imshow(load_img(val_path+val.filenames[start_index+i]))
plt.tight_layout()
plt.show()
Compiling our model which we saved using DeepC Compiler¶
In [24]:
!deepCC real_vs_fake.h5