Fingerprint Pattern Classification

Fingerprint, as a unique feature of each person, can be divided into different types. In this notebook, we identify real fingerprints pattern and classify them with convolutional neural networks(CNN).

Let's get started!¶

Importing the necessary libraries¶

import os
import random
import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers

Getting Data¶

!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/dataset_HFu5SVU.zip
!unzip -qo dataset_HFu5SVU.zip
dir = 'dataset'

--2021-08-27 05:42:38--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/dataset_HFu5SVU.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.43
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 54203739 (52M) [application/x-zip-compressed]
Saving to: ‘dataset_HFu5SVU.zip’

dataset_HFu5SVU.zip 100%[===================>]  51.69M  92.1MB/s    in 0.6s    

2021-08-27 05:42:38 (92.1 MB/s) - ‘dataset_HFu5SVU.zip’ saved [54203739/54203739]

Data Analysis¶

The data is in the format:¶

name.png (For example, f0038_02.png)
name.txt (For example, f0038_02.txt)

A sample of the content in the text file is given:

Gender: M
Class: T
History: f0038_02.pct TL a2652.pct

There are 5 different classes namely:

Arch (A)
Left Loop (L)
Right Loop (R)
Tented Arch (T)
Whirl (W)

Reading the text file and saving the required information to a csv file¶

labels = []
img_names = []
img_paths = []
gender = []
for file in os.listdir(dir):
    if file.endswith('.txt'):
        with open(os.path.join(dir, file), 'r') as t:
            content = t.readlines()
            gender.append(content[0].rsplit(' ')[1][0])
            img_name = content[2].rsplit(' ')[1][:-4] + '.png'
            img_paths.append(os.path.join(dir,img_name))
            img_names.append(img_name)
            labels.append((content[1].rsplit(' ')[1][0]))

df = pd.DataFrame()
df['IMAGE PATH'] = img_paths
df['IMAGE NAME'] = img_names
df['LABEL'] = labels
df['GENDER'] = gender

df.head()

Checking the data for any imbalance¶

fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.countplot(ax=axes[0], data = df, x = 'LABEL')
sns.countplot(ax=axes[1], data = df, x = 'LABEL', hue = 'GENDER')

<AxesSubplot:xlabel='LABEL', ylabel='count'>

df['LABEL'].value_counts()

L    800
A    800
R    800
T    800
W    800
Name: LABEL, dtype: int64

From the plots above, it is clearly evident that there is a huge imbalance of the gender category. We won't train the model using that category and hence we can drop it. The labels are perfectly balanced, so we will continue use it without any changes.

df.drop(columns = 'GENDER',inplace=True)
df.head()

Mapping the classes to an integer¶

classes = list(np.unique(labels))
print(classes)
map_classes = dict(zip(classes, [t for t in range(len(classes))]))
print(map_classes)
df['MAPPED LABELS'] = [map_classes[i] for i in df['LABEL']]
df = df.sample(frac = 1)
df.to_csv('dataset.csv')
df.head()

['A', 'L', 'R', 'T', 'W']
{'A': 0, 'L': 1, 'R': 2, 'T': 3, 'W': 4}

df['MAPPED LABELS'] = [map_classes[i] for i in df['LABEL']]
df = df.sample(frac = 1)
df.to_csv('dataset.csv')
df.head()

Plotting one image from each of the different classes¶

dim = len(classes)
fig,axes = plt.subplots(1,dim) 
fig.subplots_adjust(0,0,2,2)
for idx, i in enumerate(classes):
    dum = df[df['LABEL'] == i]
    random_num = random.choice(dum.index)
    label = df.loc[random_num]['LABEL']
    axes[idx].imshow(cv2.imread(df.loc[random_num]['IMAGE PATH']))
    axes[idx].set_title("CLASS: "+label +"\n" +  "LABEL:"+str(map_classes[label]))
    axes[idx].axis('off')

Checking if the images are grayscale¶

random_number = random.randint(0,len(df))
img_path = df.loc[random_number]['IMAGE PATH']
gray_img = cv2.imread(img_path,0)
color_img = cv2.imread(img_path)
resized_img = cv2.resize(cv2.imread(img_path,0), (128,128)) #Resized Grayscale image

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].set_title('SINGLE CHANNEL\n'+ str(gray_img.shape))
axes[0].imshow(gray_img, cmap='gray')
axes[1].set_title('THREE CHANNELS\n'+ str(color_img.shape))
axes[1].imshow(color_img)
axes[2].set_title('RESIZED IMAGE\n'+ str(resized_img.shape))
axes[2].imshow(resized_img, cmap='gray')

<matplotlib.image.AxesImage at 0x7fc40474aeb8>

The images that will be used is of the size (128,128). Now we can proceed towards building a model

Model and Inference¶

Data Preparation¶

X_data = df['IMAGE PATH']
y_data = df['MAPPED LABELS']
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, shuffle=True, test_size=0.01,stratify=y_data)
#Creating numpy arrays of images
X = []
y = []
for i in X_train:
    X.append(cv2.imread(i))
for i in y_train:
    y.append(i)
X = np.array(X)
y = np.array(y)
# Converting the labels vector to one-hot format
y = keras.utils.to_categorical(y, 5)

print(f"Total number of Images: {len(X_data)}")
print(f"Number of Training Images: {len(X_train)}")
print(f"Number of Test Images: {len(X_test)}") # Saving a small number of images for model testing|
print(f"Shape of Images: {X[0].shape}") #Printing the shape of Images

Total number of Images: 4000
Number of Training Images: 3960
Number of Test Images: 40
Shape of Images: (128, 128, 3)

Model Architecture¶

model = keras.Sequential(
    [
        layers.Conv2D(32, input_shape=(128,128,3),padding="same",kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(32, kernel_size=(3, 3), padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(128, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(5, activation="softmax",kernel_regularizer='l1_l2'),
    ]
)

Checking the model parameters¶

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 128, 128, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 64, 64, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 4, 4, 128)         73856     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 2, 2, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 5)                 2565      
=================================================================
Total params: 178,917
Trainable params: 178,917
Non-trainable params: 0
_________________________________________________________________

Let the training begin!¶

#Hyperparameters
LOSS = 'categorical_crossentropy'
OPTIMIZER = 'adam'
BATCH_SIZE = 64
EPOCHS = 20

model.compile(loss=LOSS, optimizer=OPTIMIZER, metrics=['accuracy'])

history=model.fit(x=X, y=y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_split=0.2)

Epoch 1/20
 1/50 [..............................] - ETA: 0s - loss: 18.1598 - accuracy: 0.2344WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0069s vs `on_train_batch_end` time: 0.0119s). Check your callbacks.
50/50 [==============================] - 1s 28ms/step - loss: 4.0465 - accuracy: 0.1954 - val_loss: 2.8399 - val_accuracy: 0.2424
Epoch 2/20
50/50 [==============================] - 1s 20ms/step - loss: 2.7170 - accuracy: 0.2794 - val_loss: 2.5680 - val_accuracy: 0.3775
Epoch 3/20
50/50 [==============================] - 1s 20ms/step - loss: 2.3745 - accuracy: 0.4154 - val_loss: 2.1045 - val_accuracy: 0.4949
Epoch 4/20
50/50 [==============================] - 1s 20ms/step - loss: 1.9787 - accuracy: 0.5395 - val_loss: 1.7295 - val_accuracy: 0.6098
Epoch 5/20
50/50 [==============================] - 1s 20ms/step - loss: 1.6967 - accuracy: 0.6196 - val_loss: 1.5379 - val_accuracy: 0.6591
Epoch 6/20
50/50 [==============================] - 1s 20ms/step - loss: 1.4865 - accuracy: 0.6676 - val_loss: 1.3845 - val_accuracy: 0.6856
Epoch 7/20
50/50 [==============================] - 1s 20ms/step - loss: 1.2890 - accuracy: 0.7229 - val_loss: 1.2384 - val_accuracy: 0.7121
Epoch 8/20
50/50 [==============================] - 1s 20ms/step - loss: 1.1819 - accuracy: 0.7377 - val_loss: 1.0717 - val_accuracy: 0.7652
Epoch 9/20
50/50 [==============================] - 1s 20ms/step - loss: 1.0472 - accuracy: 0.7759 - val_loss: 0.9677 - val_accuracy: 0.7753
Epoch 10/20
50/50 [==============================] - 1s 20ms/step - loss: 0.9738 - accuracy: 0.7790 - val_loss: 0.8944 - val_accuracy: 0.7955
Epoch 11/20
50/50 [==============================] - 1s 20ms/step - loss: 0.8502 - accuracy: 0.8119 - val_loss: 0.8231 - val_accuracy: 0.8056
Epoch 12/20
50/50 [==============================] - 1s 20ms/step - loss: 0.8046 - accuracy: 0.8097 - val_loss: 0.7901 - val_accuracy: 0.8106
Epoch 13/20
50/50 [==============================] - 1s 20ms/step - loss: 0.7478 - accuracy: 0.8201 - val_loss: 0.7782 - val_accuracy: 0.8056

Plotting Loss and Accuracy graphs¶

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'val'], loc='center right')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'val'], loc='upper right')
plt.show()

Preparing data for testing¶

test_X = []
for i in X_test:
    im = cv2.imread(i)
    im = np.reshape(im, (1,128,128,3))
    test_X.append(im)
test_X = np.array(test_X)

Plotting the predictions for the test images¶

fig,axes = plt.subplots(5,5) 
fig.subplots_adjust(0,0,3,3)
for i in range(0,5,1):
    for j in range(0,5,1):
        num = random.randint(0,len(test_X)-1)
        display_image = test_X[num].squeeze(0)
        image = test_X[num]
        predicted_prob = model.predict(image)
        predicted_class = np.argmax(predicted_prob)
        ground_truth =classes[y_test.iloc[num]]
        axes[i,j].imshow(display_image)
        axes[i,j].imshow(display_image)
        if(classes[predicted_class] != classes[y_test.iloc[num]]):
            t = 'PREDICTED {} \n GROUND TRUTH[{}]'.format(classes[predicted_class], classes[y_test.iloc[num]])
            axes[i,j].set_title(t, fontdict={'color': 'darkred'})
        else:
            t = '[CORRECT] {}'.format(classes[predicted_class]) 
            axes[i,j].set_title(t)
        axes[i,j].axis('off')

Saving the model¶

#Saving the model
model.save('fingerprint.h5')

DeepCC¶

!deepCC fingerprint.h5

	IMAGE PATH	IMAGE NAME	LABEL	GENDER
0	dataset/s1913_06.png	s1913_06.png	L	M
1	dataset/f1898_05.png	f1898_05.png	W	M
2	dataset/s0087_03.png	s0087_03.png	T	M
3	dataset/s0523_06.png	s0523_06.png	L	M
4	dataset/s0688_09.png	s0688_09.png	T	M

	IMAGE PATH	IMAGE NAME	LABEL
0	dataset/s1913_06.png	s1913_06.png	L
1	dataset/f1898_05.png	f1898_05.png	W
2	dataset/s0087_03.png	s0087_03.png	T
3	dataset/s0523_06.png	s0523_06.png	L
4	dataset/s0688_09.png	s0688_09.png	T

	IMAGE PATH	IMAGE NAME	LABEL	MAPPED LABELS
3199	dataset/s0713_03.png	s0713_03.png	R	2
2331	dataset/f1538_04.png	f1538_04.png	R	2
1589	dataset/f1003_04.png	f1003_04.png	T	3
97	dataset/f0531_02.png	f0531_02.png	T	3
3683	dataset/f1578_01.png	f1578_01.png	W	4

	IMAGE PATH	IMAGE NAME	LABEL	MAPPED LABELS
3922	dataset/s0628_04.png	s0628_04.png	W	4
3811	dataset/s1990_09.png	s1990_09.png	L	1
2755	dataset/f1187_06.png	f1187_06.png	L	1
3916	dataset/s0801_07.png	s0801_07.png	A	0
1068	dataset/f0813_03.png	f0813_03.png	R	2