Mineral Classification¶

Photo by Verstiuk Production on Dribbble

A Mineral Classifier can be used to identify the minerals just by looking at their photographs without any need of human intervention and can thus help humans in mineral exploitation.

Importing the Dataset¶

!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/minet.zip"
!unzip -qo minet.zip 
!rm minet.zip

minet.zip           100%[===================>] 118.00M  81.1MB/s    in 1.5s    

2020-11-20 11:58:46 (81.1 MB/s) - ‘minet.zip’ saved [123734481/123734481]

Importing necessary Libraries¶

from torchvision.datasets import ImageFolder
from torchvision import transforms

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import time
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from collections import Counter

import cv2
from keras.layers import Dense, Flatten, AveragePooling2D, Dropout
from keras.models import Model
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam

Data Analysis¶

root_folder = 'minet/train'
target_label = ['biotite', 'bornite', 'chrysocolla', 'malachite', 
               'muscovite', 'pyrite', 'quartz']

dataset = ImageFolder(root_folder, transform=transforms.ToTensor())
print('Data size: ',len(dataset))
dataset.classes

Data size:  350

['biotite',
 'bornite',
 'chrysocolla',
 'malachite',
 'muscovite',
 'pyrite',
 'quartz']

#check images of the dataset for first 20 images
fig = plt.figure(figsize=(25, 4))

for i in range(20):
    image, label = dataset[i]
    ax = fig.add_subplot(2, 10, i+1, xticks=[], yticks = [])
    ax.imshow(image.permute(1,2,0))
    ax.set_title(target_label[label], color='green')

#count number for each label
count = {}

for i in range(len(dataset)):
    _, labels = dataset[i]
    label = target_label[labels]
    if label not in count:
        count[label] = 1
    elif label in count:
        count[label] += 1

#insert count into dataframe
df = pd.DataFrame(count, index=np.arange(1))
df = df.transpose().reset_index()
df.columns = ['Mineral', 'count']
df

#plot barplot for the sake of easy to read
sns.barplot(df['Mineral'], df['count'])
plt.title('Dataset for each label');
plt.xticks(rotation=30)
plt.grid(axis='y')

/opt/tljh/user/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  FutureWarning

#check image size for all datasets
# checking the shape of data (C x H x W) 
height = []
width = []
for i in range(len(dataset)):
    image, label = dataset[i]
    height.append(image.size(1))
    width.append(image.size(2))
print(f"maximum_height:{np.max(height)} \tminimum_height:{np.min(height)} \tmean_height:{np.mean(height)}")
print(f"maximum_width:{np.max(width)} \tminimum_width:{np.min(width)} \tmean_width:{np.mean(width)}")

maximum_height:5669 	minimum_height:148 	mean_height:722.6914285714286
maximum_width:5184 	minimum_width:144 	mean_width:826.4685714285714

Load the data with Keras' Data Loader¶

data_path = "minet"
# Data agumentation on train and test

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   zoom_range = 0.2,
                                   rotation_range=15,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

# create dataset train
training_set = train_datagen.flow_from_directory(data_path + '/train',
                                                 target_size = (224, 224),
                                                 batch_size = 64,
                                                 class_mode = 'categorical',
                                                 shuffle=True)
# Create test data set
test_set = test_datagen.flow_from_directory(data_path + '/test',
                                             target_size = (224, 224),
                                             batch_size = 64,
                                             class_mode = 'categorical',
                                             shuffle = False)

Found 350 images belonging to 7 classes.
Found 70 images belonging to 7 classes.

Model Architecture¶

# Model creation with changes

model = VGG16(input_shape=(224,224,3),include_top=False)

for layer in model.layers:
    layer.trainable = False

newModel = model.output
newModel = AveragePooling2D()(newModel)
newModel = Flatten()(newModel)
newModel = Dense(128, activation="relu")(newModel)
newModel = Dropout(0.5)(newModel)
newModel = Dense(7, activation='softmax')(newModel)

model = Model(inputs=model.input, outputs=newModel)

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
average_pooling2d (AveragePo (None, 3, 3, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 4608)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               589952    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 7)                 903       
=================================================================
Total params: 15,305,543
Trainable params: 590,855
Non-trainable params: 14,714,688
_________________________________________________________________

Model Training¶

opt=Adam(learning_rate=0.001)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(training_set,
                              validation_data = test_set,
                              epochs=40)

Epoch 1/40
6/6 [==============================] - 27s 3s/step - loss: 2.3578 - accuracy: 0.1499 - val_loss: 1.6756 - val_accuracy: 0.3571
Epoch 2/40
6/6 [==============================] - 11s 2s/step - loss: 1.7478 - accuracy: 0.2915 - val_loss: 1.4412 - val_accuracy: 0.4000
Epoch 3/40
6/6 [==============================] - 11s 2s/step - loss: 1.5084 - accuracy: 0.4109 - val_loss: 1.2450 - val_accuracy: 0.5857
Epoch 4/40
6/6 [==============================] - 11s 2s/step - loss: 1.3104 - accuracy: 0.5091 - val_loss: 1.0414 - val_accuracy: 0.6429
Epoch 5/40
6/6 [==============================] - 11s 2s/step - loss: 1.1368 - accuracy: 0.5924 - val_loss: 0.8782 - val_accuracy: 0.8000
Epoch 6/40
6/6 [==============================] - 11s 2s/step - loss: 1.1362 - accuracy: 0.5883 - val_loss: 0.7869 - val_accuracy: 0.7857
Epoch 7/40
6/6 [==============================] - 11s 2s/step - loss: 1.0500 - accuracy: 0.6175 - val_loss: 0.7040 - val_accuracy: 0.8571
Epoch 8/40
6/6 [==============================] - 11s 2s/step - loss: 0.8915 - accuracy: 0.7265 - val_loss: 0.5999 - val_accuracy: 0.8857
Epoch 9/40
6/6 [==============================] - 11s 2s/step - loss: 0.8922 - accuracy: 0.6939 - val_loss: 0.5460 - val_accuracy: 0.8714
Epoch 10/40
6/6 [==============================] - 11s 2s/step - loss: 0.7350 - accuracy: 0.7345 - val_loss: 0.5032 - val_accuracy: 0.9000
Epoch 11/40
6/6 [==============================] - 11s 2s/step - loss: 0.7862 - accuracy: 0.6978 - val_loss: 0.4756 - val_accuracy: 0.9286
Epoch 12/40
6/6 [==============================] - 11s 2s/step - loss: 0.7078 - accuracy: 0.7631 - val_loss: 0.4035 - val_accuracy: 0.9286
Epoch 13/40
6/6 [==============================] - 11s 2s/step - loss: 0.6657 - accuracy: 0.7912 - val_loss: 0.3747 - val_accuracy: 0.9429
Epoch 14/40
6/6 [==============================] - 11s 2s/step - loss: 0.6236 - accuracy: 0.8170 - val_loss: 0.3458 - val_accuracy: 0.9571
Epoch 15/40
6/6 [==============================] - 11s 2s/step - loss: 0.5781 - accuracy: 0.8072 - val_loss: 0.2976 - val_accuracy: 0.9571
Epoch 16/40
6/6 [==============================] - 11s 2s/step - loss: 0.5286 - accuracy: 0.8160 - val_loss: 0.2926 - val_accuracy: 0.9429
Epoch 17/40
6/6 [==============================] - 11s 2s/step - loss: 0.5310 - accuracy: 0.8334 - val_loss: 0.2746 - val_accuracy: 0.9571
Epoch 18/40
6/6 [==============================] - 11s 2s/step - loss: 0.5028 - accuracy: 0.8198 - val_loss: 0.2765 - val_accuracy: 0.9571
Epoch 19/40
6/6 [==============================] - 11s 2s/step - loss: 0.5070 - accuracy: 0.8068 - val_loss: 0.2478 - val_accuracy: 0.9286
Epoch 20/40
6/6 [==============================] - 11s 2s/step - loss: 0.4774 - accuracy: 0.8541 - val_loss: 0.2343 - val_accuracy: 0.9571
Epoch 21/40
6/6 [==============================] - 10s 2s/step - loss: 0.4381 - accuracy: 0.8588 - val_loss: 0.1998 - val_accuracy: 0.9714
Epoch 22/40
6/6 [==============================] - 11s 2s/step - loss: 0.4507 - accuracy: 0.8877 - val_loss: 0.1701 - val_accuracy: 1.0000
Epoch 23/40
6/6 [==============================] - 10s 2s/step - loss: 0.3584 - accuracy: 0.8882 - val_loss: 0.1623 - val_accuracy: 0.9714
Epoch 24/40
6/6 [==============================] - 10s 2s/step - loss: 0.4024 - accuracy: 0.8788 - val_loss: 0.1507 - val_accuracy: 1.0000
Epoch 25/40
6/6 [==============================] - 10s 2s/step - loss: 0.3310 - accuracy: 0.9361 - val_loss: 0.1438 - val_accuracy: 0.9714
Epoch 26/40
6/6 [==============================] - 11s 2s/step - loss: 0.3418 - accuracy: 0.9080 - val_loss: 0.1423 - val_accuracy: 1.0000
Epoch 27/40
6/6 [==============================] - 11s 2s/step - loss: 0.3393 - accuracy: 0.9311 - val_loss: 0.1306 - val_accuracy: 1.0000
Epoch 28/40
6/6 [==============================] - 11s 2s/step - loss: 0.3339 - accuracy: 0.8944 - val_loss: 0.1198 - val_accuracy: 1.0000
Epoch 29/40
6/6 [==============================] - 11s 2s/step - loss: 0.3516 - accuracy: 0.8969 - val_loss: 0.1068 - val_accuracy: 1.0000
Epoch 30/40
6/6 [==============================] - 11s 2s/step - loss: 0.3174 - accuracy: 0.9004 - val_loss: 0.1276 - val_accuracy: 0.9714
Epoch 31/40
6/6 [==============================] - 11s 2s/step - loss: 0.3031 - accuracy: 0.9225 - val_loss: 0.0951 - val_accuracy: 1.0000
Epoch 32/40
6/6 [==============================] - 11s 2s/step - loss: 0.3049 - accuracy: 0.9160 - val_loss: 0.0903 - val_accuracy: 1.0000
Epoch 33/40
6/6 [==============================] - 10s 2s/step - loss: 0.2898 - accuracy: 0.9219 - val_loss: 0.0934 - val_accuracy: 1.0000
Epoch 34/40
6/6 [==============================] - 11s 2s/step - loss: 0.2245 - accuracy: 0.9403 - val_loss: 0.0752 - val_accuracy: 1.0000
Epoch 35/40
6/6 [==============================] - 10s 2s/step - loss: 0.2743 - accuracy: 0.9268 - val_loss: 0.0832 - val_accuracy: 1.0000
Epoch 36/40
6/6 [==============================] - 11s 2s/step - loss: 0.2556 - accuracy: 0.9324 - val_loss: 0.0759 - val_accuracy: 1.0000
Epoch 37/40
6/6 [==============================] - 11s 2s/step - loss: 0.2430 - accuracy: 0.9502 - val_loss: 0.0635 - val_accuracy: 1.0000
Epoch 38/40
6/6 [==============================] - 11s 2s/step - loss: 0.2140 - accuracy: 0.9436 - val_loss: 0.0608 - val_accuracy: 1.0000
Epoch 39/40
6/6 [==============================] - 11s 2s/step - loss: 0.2048 - accuracy: 0.9566 - val_loss: 0.0706 - val_accuracy: 1.0000
Epoch 40/40
6/6 [==============================] - 11s 2s/step - loss: 0.2070 - accuracy: 0.9428 - val_loss: 0.0520 - val_accuracy: 1.0000

Training Plots¶

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs=range(len(acc))

plt.plot(epochs,acc,label='Trainin_acc',color='blue')
plt.plot(epochs,val_acc,label='Validation_acc',color='red')
plt.legend()
plt.title("Training and Validation Accuracy")

Text(0.5, 1.0, 'Training and Validation Accuracy')

plt.plot(epochs,loss,label='Training_loss',color='blue')
plt.plot(epochs,val_loss,label='Validation_loss',color='red')
plt.legend()
plt.title("Training and Validation loss")

Text(0.5, 1.0, 'Training and Validation loss')

Accessing the performance of the Model¶

training_set.class_indices

{'biotite': 0,
 'bornite': 1,
 'chrysocolla': 2,
 'malachite': 3,
 'muscovite': 4,
 'pyrite': 5,
 'quartz': 6}

class_dict = {0: 'biotite',
 1: 'bornite',
 2:'chrysocolla',
 3: 'malachite',
 4: 'muscovite',
 5: 'pyrite',
 6: 'quartz'}

file_path =  'minet/test/biotite/0001.jpg'
test_image = cv2.imread(file_path)
test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB)
test_image = cv2.resize(test_image, (224,224),interpolation=cv2.INTER_CUBIC)
plt.imshow(test_image)
test_image = np.expand_dims(test_image,axis=0)
probs = model.predict(test_image)
pred_class = np.argmax(probs)

pred_class = class_dict[pred_class]
print(pred_class)

biotite

file_path =  'minet/test/bornite/0010.jpg'
test_image = cv2.imread(file_path)
test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB)
test_image = cv2.resize(test_image, (224,224),interpolation=cv2.INTER_CUBIC)
plt.imshow(test_image)
test_image = np.expand_dims(test_image,axis=0)
probs = model.predict(test_image)
pred_class = np.argmax(probs)

pred_class = class_dict[pred_class]
print(pred_class)

bornite

file_path =  'minet/test/malachite/0008.jpg'
test_image = cv2.imread(file_path)
test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB)
test_image = cv2.resize(test_image, (224,224),interpolation=cv2.INTER_CUBIC)
plt.imshow(test_image)
test_image = np.expand_dims(test_image,axis=0)
probs = model.predict(test_image)
pred_class = np.argmax(probs)

pred_class = class_dict[pred_class]
print(pred_class)

malachite

Saving the Model and Compiling the model with DeepC Compiler¶

model.save("mineral_classification.h5")

!deepCC mineral_classification.h5

reading [keras model] from 'mineral_classification.h5'
Saved 'mineral_classification.onnx'
reading onnx model from file  mineral_classification.onnx
Model info:
  ir_vesion :  5 
  doc       : 
WARN (ONNX): graph-node block1_conv1's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block1_conv2's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block2_conv1's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block2_conv2's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block3_conv1's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block3_conv2's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block3_conv3's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block4_conv1's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block4_conv2's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block4_conv3's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block5_conv1's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block5_conv2's attribute auto_pad has no meaningful data.
WARN (ONNX): graph-node block5_conv3's attribute auto_pad has no meaningful data.
WARN (ONNX): terminal (input/output) input_1's shape is less than 1.
             changing it to 1.
WARN (ONNX): terminal (input/output) dense_1's shape is less than 1.
             changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
running DNNC graph sanity check ... passed.
Writing C++ file  mineral_classification_deepC/mineral_classification.cpp
INFO (ONNX): model files are ready in dir mineral_classification_deepC
g++ -std=c++11 -O3 -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 mineral_classification_deepC/mineral_classification.cpp -o mineral_classification_deepC/mineral_classification.exe
Model executable  mineral_classification_deepC/mineral_classification.exe

	Mineral	count
0	biotite	50
1	bornite	50
2	chrysocolla	50
3	malachite	50
4	muscovite	50
5	pyrite	50
6	quartz	50

Model Files
mineral_classification.h5 keras Model
deepSea Compiled Models
mineral_classification.exe deepSea Ubuntu