Differentiate pollen carrying honey bees¶

This notebook differentiates between images of honey bees carrying pollen and those that aren't.

These deep learning models can prove useful in bee farming for analysis/inference generation.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, transform
import pandas as pd
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras import models, optimizers
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Flatten
import random

Dataset¶

The dataset consists of high resolution images of individual bees on the ramp.

The dataset folder¶

The dataset folder (link below) has the following content:

images folder - Contains 300x180 resolution images of bees of both categories. The image file names contain the categories - P (for pollen) or NP (for non pollen).
pollen_data.csv - A .csv file containing the image names and corresponding labels.

!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/PollenDataset.zip
!unzip -qo PollenDataset.zip
!rm PollenDataset.zip

--2021-09-08 07:39:40--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/PollenDataset.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.158.31
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.158.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10524338 (10M) [application/zip]
Saving to: ‘PollenDataset.zip’

PollenDataset.zip   100%[===================>]  10.04M  --.-KB/s    in 0.06s   

2021-09-08 07:39:40 (179 MB/s) - ‘PollenDataset.zip’ saved [10524338/10524338]

# Going through the csv file

pollen = pd.read_csv('PollenDataset/pollen_data.csv')

# shuffle
pollen = pollen.sample(frac=1, random_state=113).reset_index(drop=True)

pollen

# Columns in the dataframe
print("Columns initially in the dataframe: ", list(pollen.columns))

pollen = pollen.drop(columns = pollen.columns[0])

# Columns in the dataframe now
print("Columns currently in the dataframe: ", list(pollen.columns))

Columns initially in the dataframe:  ['Unnamed: 0', 'filename', 'pollen_carrying']
Columns currently in the dataframe:  ['filename', 'pollen_carrying']

# Checking the spread among labels

pollen['pollen_carrying'].value_counts()

1    369
0    345
Name: pollen_carrying, dtype: int64

It is a fairly balanced dataset.

# Labels

pollen_classes = ["Pollen", "Non Pollen"]

Visualization¶

plt.figure(figsize=(10, 10))

for i in range(9):
    x = random.randint(0, len(pollen)-1)    # pick random sample
    ax = plt.subplot(3, 3, i + 1)
    row = pollen.loc[x] 
    image = Image.open('PollenDataset/images/' + row['filename'])
    plt.imshow(image)
    plt.title(pollen_classes[row['pollen_carrying']])
    plt.axis("off")

Preprocessing¶

def buildX(df, rootdir = None):
    X = []    # initialising X array
    for i in range(len(pollen)):    # loop through dataset
        row = df.loc[i]
        fname, label = row['filename'], row['pollen_carrying']
        ik = rootdir + fname
        img = Image.open(ik)
        img = np.asarray(img)
        X.append(img/255)    # normalize image and append to X

    return X

# building X array from images
X = buildX(pollen, 'PollenDataset/images/')
y = pollen['pollen_carrying']

# as numpy array
X = np.array(X)
y = np.array(y)

# printing some inference
print("Shape of one image: ", X[0].shape)
print('The shape of X: ', X.shape)  
print('The shape of y:', y.shape)

Shape of one image:  (300, 180, 3)
The shape of X:  (714, 300, 180, 3)
The shape of y: (714,)

# Creating train-test using an 80-20 split
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.20, random_state = 13)

print('Number of samples in:')
print("Training set: ", xtrain.shape[0])
print("Test set: ", xtest.shape[0])

print("\nLets see the spread of labels - ")
print("Training set - \t 1: ", list(ytrain).count(1), "\t0: ", list(ytrain).count(0))
print("Test set - \t 1: ", list(ytest).count(1), "\t0: ", list(ytest).count(0))

Number of samples in:
Training set:  571
Test set:  143

Lets see the spread of labels - 
Training set - 	 1:  298 	0:  273
Test set - 	 1:  71 	0:  72

model = models.Sequential([
    Conv2D(64,(3,3), activation='relu', input_shape=xtrain[0].shape),
    MaxPooling2D(2,2),
    Conv2D(64,(3,3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(128,(3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128,(3,3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(256,(3,3), activation='relu'),
    MaxPooling2D(2,2),
    
    Flatten(),
    Dropout(0.4),

    Dense(256, activation = 'relu'),
    Dense(1, activation = 'sigmoid')
    ])

model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(0.001), metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 298, 178, 64)      1792      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 149, 89, 64)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 147, 87, 64)       36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 43, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 71, 41, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 20, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 33, 18, 128)       147584    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 9, 128)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 7, 256)        295168    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 3, 256)         0         
_________________________________________________________________
flatten (Flatten)            (None, 5376)              0         
_________________________________________________________________
dropout (Dropout)            (None, 5376)              0         
_________________________________________________________________
dense (Dense)                (None, 256)               1376512   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257       
=================================================================
Total params: 1,932,097
Trainable params: 1,932,097
Non-trainable params: 0
_________________________________________________________________

history = model.fit(xtrain, ytrain, epochs=16, validation_data = (xtest, ytest), verbose = 1)

Epoch 1/16
 2/18 [==>...........................] - ETA: 0s - loss: 0.6949 - accuracy: 0.4375WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0232s vs `on_train_batch_end` time: 0.0446s). Check your callbacks.
18/18 [==============================] - 2s 129ms/step - loss: 0.6996 - accuracy: 0.4939 - val_loss: 0.6906 - val_accuracy: 0.6154
Epoch 2/16
18/18 [==============================] - 1s 71ms/step - loss: 0.6281 - accuracy: 0.6725 - val_loss: 0.6649 - val_accuracy: 0.6294
Epoch 3/16
18/18 [==============================] - 1s 71ms/step - loss: 0.5982 - accuracy: 0.7093 - val_loss: 0.5056 - val_accuracy: 0.7413
Epoch 4/16
18/18 [==============================] - 1s 71ms/step - loss: 0.4575 - accuracy: 0.7811 - val_loss: 0.3893 - val_accuracy: 0.8182
Epoch 5/16
18/18 [==============================] - 1s 71ms/step - loss: 0.3362 - accuracy: 0.8704 - val_loss: 0.2963 - val_accuracy: 0.8322
Epoch 6/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2438 - accuracy: 0.9002 - val_loss: 0.3174 - val_accuracy: 0.8531
Epoch 7/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2337 - accuracy: 0.9089 - val_loss: 0.2500 - val_accuracy: 0.8951
Epoch 8/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2087 - accuracy: 0.9229 - val_loss: 0.3014 - val_accuracy: 0.8671
Epoch 9/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1650 - accuracy: 0.9440 - val_loss: 0.2356 - val_accuracy: 0.8951
Epoch 10/16
18/18 [==============================] - 1s 71ms/step - loss: 0.1597 - accuracy: 0.9475 - val_loss: 0.2379 - val_accuracy: 0.8951
Epoch 11/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1539 - accuracy: 0.9405 - val_loss: 0.2077 - val_accuracy: 0.9161
Epoch 12/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1037 - accuracy: 0.9580 - val_loss: 0.1476 - val_accuracy: 0.9371
Epoch 13/16
18/18 [==============================] - 1s 71ms/step - loss: 0.0925 - accuracy: 0.9702 - val_loss: 0.2141 - val_accuracy: 0.9091
Epoch 14/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1355 - accuracy: 0.9510 - val_loss: 0.2172 - val_accuracy: 0.9301
Epoch 15/16
18/18 [==============================] - 1s 72ms/step - loss: 0.0867 - accuracy: 0.9720 - val_loss: 0.2226 - val_accuracy: 0.9301
Epoch 16/16
18/18 [==============================] - 1s 71ms/step - loss: 0.0757 - accuracy: 0.9737 - val_loss: 0.1594 - val_accuracy: 0.9231

model.evaluate(xtest, ytest)

5/5 [==============================] - 0s 15ms/step - loss: 0.1594 - accuracy: 0.9231

[0.15937379002571106, 0.9230769276618958]

Plotting the metrics¶

def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.legend([variable, variable2])
    plt.title(variable)

plot(history.history, "accuracy", "val_accuracy")

plot(history.history, "loss", "val_loss")

Prediction¶

i = random.randint(0, len(xtest)-1)

output = model(np.expand_dims(xtest[i], 0))

pred = output.numpy()[0][0]

plt.imshow(xtest[0])
plt.axis('off')

print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(pred>0.5)], " (", pred, " --> ", int(pred>0.5), ")")

Actual label:  Non Pollen
Model prediction :  Non Pollen  ( 1.0  -->  1 )

deepC¶

model.save('honey_bee_pollen.h5')

!deepCC honey_bee_pollen.h5

[INFO]
Reading [keras model] 'honey_bee_pollen.h5'
[SUCCESS]
Saved 'honey_bee_pollen_deepC/honey_bee_pollen.onnx'
[INFO]
Reading [onnx model] 'honey_bee_pollen_deepC/honey_bee_pollen.onnx'
[INFO]
Model info:
  ir_vesion : 5
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) conv2d_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'honey_bee_pollen_deepC/honey_bee_pollen.cpp'
[INFO]
deepSea model files are ready in 'honey_bee_pollen_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "honey_bee_pollen_deepC/honey_bee_pollen.cpp" -D_AITS_MAIN -o "honey_bee_pollen_deepC/honey_bee_pollen.exe"
[RUNNING COMMAND]
size "honey_bee_pollen_deepC/honey_bee_pollen.exe"
   text	   data	    bss	    dec	    hex	filename
7912125	   3760	    760	7916645	 78cc65	honey_bee_pollen_deepC/honey_bee_pollen.exe
[SUCCESS]
Saved model as executable "honey_bee_pollen_deepC/honey_bee_pollen.exe"

i = random.randint(0, len(xtest)-1)

np.savetxt('sample.data', (xtest[i]).flatten())  
    
!honey_bee_pollen_deepC/honey_bee_pollen.exe sample.data

nn_out = np.loadtxt('deepSea_result_1.out')

plt.imshow(xtest[i])
plt.axis('off')

print()
print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(nn_out>0.5)], " (", nn_out, " --> ", int(nn_out>0.5), ")")

Warn: conv2d_Relu_0_pooling: auto_pad attribute is deprecated, it'll be ignored.

	Unnamed: 0	filename	pollen_carrying
0	544	NP23667-118r.jpg	0
1	524	NP28349-222r.jpg	0
2	650	P28616-235r.jpg	1
3	596	P12922-186r.jpg	1
4	673	NP52328-19r.jpg	0
...	...	...	...
709	253	NP55879-97r.jpg	0
710	662	P56414-106r.jpg	1
711	586	NP21559-72r.jpg	0
712	162	P268-2r.jpg	1
713	165	P56585-112r.jpg	1

Model Files
honey_bee_pollen.h5 keras Model
deepSea Compiled Models
honey_bee_pollen.exe deepSea Ubuntu