Cainvas

Fingerprint Pattern Classification

Credit: AITS Cainvas Community

Photo by Manu Designer on Dribbble

Fingerprint, as a unique feature of each person, can be divided into different types. In this notebook, we identify real fingerprints pattern and classify them with convolutional neural networks(CNN).

Let's get started!

Importing the necessary libraries

In [1]:
import os
import random
import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers

Getting Data

In [2]:
!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/dataset_HFu5SVU.zip
!unzip -qo dataset_HFu5SVU.zip
dir = 'dataset'
--2021-08-27 05:42:38--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/dataset_HFu5SVU.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.43
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.43|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 54203739 (52M) [application/x-zip-compressed]
Saving to: ‘dataset_HFu5SVU.zip’

dataset_HFu5SVU.zip 100%[===================>]  51.69M  92.1MB/s    in 0.6s    

2021-08-27 05:42:38 (92.1 MB/s) - ‘dataset_HFu5SVU.zip’ saved [54203739/54203739]

Data Analysis


The data is in the format:
  • name.png (For example, f0038_02.png)
  • name.txt (For example, f0038_02.txt)

A sample of the content in the text file is given:

  • Gender: M
  • Class: T
  • History: f0038_02.pct TL a2652.pct

There are 5 different classes namely:

  • Arch (A)
  • Left Loop (L)
  • Right Loop (R)
  • Tented Arch (T)
  • Whirl (W)

Reading the text file and saving the required information to a csv file

In [3]:
labels = []
img_names = []
img_paths = []
gender = []
for file in os.listdir(dir):
    if file.endswith('.txt'):
        with open(os.path.join(dir, file), 'r') as t:
            content = t.readlines()
            gender.append(content[0].rsplit(' ')[1][0])
            img_name = content[2].rsplit(' ')[1][:-4] + '.png'
            img_paths.append(os.path.join(dir,img_name))
            img_names.append(img_name)
            labels.append((content[1].rsplit(' ')[1][0]))
In [4]:
df = pd.DataFrame()
df['IMAGE PATH'] = img_paths
df['IMAGE NAME'] = img_names
df['LABEL'] = labels
df['GENDER'] = gender
In [5]:
df.head()
Out[5]:
IMAGE PATH IMAGE NAME LABEL GENDER
0 dataset/s1913_06.png s1913_06.png L M
1 dataset/f1898_05.png f1898_05.png W M
2 dataset/s0087_03.png s0087_03.png T M
3 dataset/s0523_06.png s0523_06.png L M
4 dataset/s0688_09.png s0688_09.png T M

Checking the data for any imbalance

In [6]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.countplot(ax=axes[0], data = df, x = 'LABEL')
sns.countplot(ax=axes[1], data = df, x = 'LABEL', hue = 'GENDER')
Out[6]:
<AxesSubplot:xlabel='LABEL', ylabel='count'>
In [7]:
df['LABEL'].value_counts()
Out[7]:
L    800
A    800
R    800
T    800
W    800
Name: LABEL, dtype: int64

From the plots above, it is clearly evident that there is a huge imbalance of the gender category. We won't train the model using that category and hence we can drop it. The labels are perfectly balanced, so we will continue use it without any changes.

In [8]:
df.drop(columns = 'GENDER',inplace=True)
df.head()
Out[8]:
IMAGE PATH IMAGE NAME LABEL
0 dataset/s1913_06.png s1913_06.png L
1 dataset/f1898_05.png f1898_05.png W
2 dataset/s0087_03.png s0087_03.png T
3 dataset/s0523_06.png s0523_06.png L
4 dataset/s0688_09.png s0688_09.png T

Mapping the classes to an integer

In [9]:
classes = list(np.unique(labels))
print(classes)
map_classes = dict(zip(classes, [t for t in range(len(classes))]))
print(map_classes)
df['MAPPED LABELS'] = [map_classes[i] for i in df['LABEL']]
df = df.sample(frac = 1)
df.to_csv('dataset.csv')
df.head()
['A', 'L', 'R', 'T', 'W']
{'A': 0, 'L': 1, 'R': 2, 'T': 3, 'W': 4}
Out[9]:
IMAGE PATH IMAGE NAME LABEL MAPPED LABELS
3199 dataset/s0713_03.png s0713_03.png R 2
2331 dataset/f1538_04.png f1538_04.png R 2
1589 dataset/f1003_04.png f1003_04.png T 3
97 dataset/f0531_02.png f0531_02.png T 3
3683 dataset/f1578_01.png f1578_01.png W 4
In [10]:
df['MAPPED LABELS'] = [map_classes[i] for i in df['LABEL']]
df = df.sample(frac = 1)
df.to_csv('dataset.csv')
df.head()
Out[10]:
IMAGE PATH IMAGE NAME LABEL MAPPED LABELS
3922 dataset/s0628_04.png s0628_04.png W 4
3811 dataset/s1990_09.png s1990_09.png L 1
2755 dataset/f1187_06.png f1187_06.png L 1
3916 dataset/s0801_07.png s0801_07.png A 0
1068 dataset/f0813_03.png f0813_03.png R 2

Plotting one image from each of the different classes

In [11]:
dim = len(classes)
fig,axes = plt.subplots(1,dim) 
fig.subplots_adjust(0,0,2,2)
for idx, i in enumerate(classes):
    dum = df[df['LABEL'] == i]
    random_num = random.choice(dum.index)
    label = df.loc[random_num]['LABEL']
    axes[idx].imshow(cv2.imread(df.loc[random_num]['IMAGE PATH']))
    axes[idx].set_title("CLASS: "+label +"\n" +  "LABEL:"+str(map_classes[label]))
    axes[idx].axis('off')

Checking if the images are grayscale

In [12]:
random_number = random.randint(0,len(df))
img_path = df.loc[random_number]['IMAGE PATH']
gray_img = cv2.imread(img_path,0)
color_img = cv2.imread(img_path)
resized_img = cv2.resize(cv2.imread(img_path,0), (128,128)) #Resized Grayscale image

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].set_title('SINGLE CHANNEL\n'+ str(gray_img.shape))
axes[0].imshow(gray_img, cmap='gray')
axes[1].set_title('THREE CHANNELS\n'+ str(color_img.shape))
axes[1].imshow(color_img)
axes[2].set_title('RESIZED IMAGE\n'+ str(resized_img.shape))
axes[2].imshow(resized_img, cmap='gray')
Out[12]:
<matplotlib.image.AxesImage at 0x7fc40474aeb8>

The images that will be used is of the size (128,128). Now we can proceed towards building a model

Model and Inference

Data Preparation

In [13]:
X_data = df['IMAGE PATH']
y_data = df['MAPPED LABELS']
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, shuffle=True, test_size=0.01,stratify=y_data)
#Creating numpy arrays of images
X = []
y = []
for i in X_train:
    X.append(cv2.imread(i))
for i in y_train:
    y.append(i)
X = np.array(X)
y = np.array(y)
# Converting the labels vector to one-hot format
y = keras.utils.to_categorical(y, 5)
In [14]:
print(f"Total number of Images: {len(X_data)}")
print(f"Number of Training Images: {len(X_train)}")
print(f"Number of Test Images: {len(X_test)}") # Saving a small number of images for model testing|
print(f"Shape of Images: {X[0].shape}") #Printing the shape of Images
Total number of Images: 4000
Number of Training Images: 3960
Number of Test Images: 40
Shape of Images: (128, 128, 3)

Model Architecture

In [15]:
model = keras.Sequential(
    [
        layers.Conv2D(32, input_shape=(128,128,3),padding="same",kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(32, kernel_size=(3, 3), padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(128, kernel_size=(3, 3),padding="same",activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(5, activation="softmax",kernel_regularizer='l1_l2'),
    ]
)

Checking the model parameters

In [16]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 128, 128, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 64, 64, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 4, 4, 128)         73856     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 2, 2, 128)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 5)                 2565      
=================================================================
Total params: 178,917
Trainable params: 178,917
Non-trainable params: 0
_________________________________________________________________

Let the training begin!

In [17]:
#Hyperparameters
LOSS = 'categorical_crossentropy'
OPTIMIZER = 'adam'
BATCH_SIZE = 64
EPOCHS = 20
In [ ]:
model.compile(loss=LOSS, optimizer=OPTIMIZER, metrics=['accuracy'])

history=model.fit(x=X, y=y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_split=0.2)
Epoch 1/20
 1/50 [..............................] - ETA: 0s - loss: 18.1598 - accuracy: 0.2344WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0069s vs `on_train_batch_end` time: 0.0119s). Check your callbacks.
50/50 [==============================] - 1s 28ms/step - loss: 4.0465 - accuracy: 0.1954 - val_loss: 2.8399 - val_accuracy: 0.2424
Epoch 2/20
50/50 [==============================] - 1s 20ms/step - loss: 2.7170 - accuracy: 0.2794 - val_loss: 2.5680 - val_accuracy: 0.3775
Epoch 3/20
50/50 [==============================] - 1s 20ms/step - loss: 2.3745 - accuracy: 0.4154 - val_loss: 2.1045 - val_accuracy: 0.4949
Epoch 4/20
50/50 [==============================] - 1s 20ms/step - loss: 1.9787 - accuracy: 0.5395 - val_loss: 1.7295 - val_accuracy: 0.6098
Epoch 5/20
50/50 [==============================] - 1s 20ms/step - loss: 1.6967 - accuracy: 0.6196 - val_loss: 1.5379 - val_accuracy: 0.6591
Epoch 6/20
50/50 [==============================] - 1s 20ms/step - loss: 1.4865 - accuracy: 0.6676 - val_loss: 1.3845 - val_accuracy: 0.6856
Epoch 7/20
50/50 [==============================] - 1s 20ms/step - loss: 1.2890 - accuracy: 0.7229 - val_loss: 1.2384 - val_accuracy: 0.7121
Epoch 8/20
50/50 [==============================] - 1s 20ms/step - loss: 1.1819 - accuracy: 0.7377 - val_loss: 1.0717 - val_accuracy: 0.7652
Epoch 9/20
50/50 [==============================] - 1s 20ms/step - loss: 1.0472 - accuracy: 0.7759 - val_loss: 0.9677 - val_accuracy: 0.7753
Epoch 10/20
50/50 [==============================] - 1s 20ms/step - loss: 0.9738 - accuracy: 0.7790 - val_loss: 0.8944 - val_accuracy: 0.7955
Epoch 11/20
50/50 [==============================] - 1s 20ms/step - loss: 0.8502 - accuracy: 0.8119 - val_loss: 0.8231 - val_accuracy: 0.8056
Epoch 12/20
50/50 [==============================] - 1s 20ms/step - loss: 0.8046 - accuracy: 0.8097 - val_loss: 0.7901 - val_accuracy: 0.8106
Epoch 13/20
50/50 [==============================] - 1s 20ms/step - loss: 0.7478 - accuracy: 0.8201 - val_loss: 0.7782 - val_accuracy: 0.8056

Plotting Loss and Accuracy graphs

In [ ]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'val'], loc='center right')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'val'], loc='upper right')
plt.show()

Preparing data for testing

In [ ]:
test_X = []
for i in X_test:
    im = cv2.imread(i)
    im = np.reshape(im, (1,128,128,3))
    test_X.append(im)
test_X = np.array(test_X)

Plotting the predictions for the test images

In [ ]:
fig,axes = plt.subplots(5,5) 
fig.subplots_adjust(0,0,3,3)
for i in range(0,5,1):
    for j in range(0,5,1):
        num = random.randint(0,len(test_X)-1)
        display_image = test_X[num].squeeze(0)
        image = test_X[num]
        predicted_prob = model.predict(image)
        predicted_class = np.argmax(predicted_prob)
        ground_truth =classes[y_test.iloc[num]]
        axes[i,j].imshow(display_image)
        axes[i,j].imshow(display_image)
        if(classes[predicted_class] != classes[y_test.iloc[num]]):
            t = 'PREDICTED {} \n GROUND TRUTH[{}]'.format(classes[predicted_class], classes[y_test.iloc[num]])
            axes[i,j].set_title(t, fontdict={'color': 'darkred'})
        else:
            t = '[CORRECT] {}'.format(classes[predicted_class]) 
            axes[i,j].set_title(t)
        axes[i,j].axis('off')

Saving the model

In [ ]:
#Saving the model
model.save('fingerprint.h5')

DeepCC

In [ ]:
!deepCC fingerprint.h5