Cainvas
Model Files
honey_bee_pollen.h5
keras
Model
deepSea Compiled Models
honey_bee_pollen.exe
deepSea
Ubuntu

Differentiate pollen carrying honey bees

Credit: AITS Cainvas Community

Photo by Thinkmojo on Dribbble

This notebook differentiates between images of honey bees carrying pollen and those that aren't.

These deep learning models can prove useful in bee farming for analysis/inference generation.

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, transform
import pandas as pd
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras import models, optimizers
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Flatten
import random

Dataset

The dataset consists of high resolution images of individual bees on the ramp.

The dataset folder

The dataset folder (link below) has the following content:

  • images folder - Contains 300x180 resolution images of bees of both categories. The image file names contain the categories - P (for pollen) or NP (for non pollen).
  • pollen_data.csv - A .csv file containing the image names and corresponding labels.
In [2]:
!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/PollenDataset.zip
!unzip -qo PollenDataset.zip
!rm PollenDataset.zip
--2021-09-08 07:39:40--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/PollenDataset.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.158.31
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.158.31|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10524338 (10M) [application/zip]
Saving to: ‘PollenDataset.zip’

PollenDataset.zip   100%[===================>]  10.04M  --.-KB/s    in 0.06s   

2021-09-08 07:39:40 (179 MB/s) - ‘PollenDataset.zip’ saved [10524338/10524338]

In [3]:
# Going through the csv file

pollen = pd.read_csv('PollenDataset/pollen_data.csv')

# shuffle
pollen = pollen.sample(frac=1, random_state=113).reset_index(drop=True)

pollen
Out[3]:
Unnamed: 0 filename pollen_carrying
0 544 NP23667-118r.jpg 0
1 524 NP28349-222r.jpg 0
2 650 P28616-235r.jpg 1
3 596 P12922-186r.jpg 1
4 673 NP52328-19r.jpg 0
... ... ... ...
709 253 NP55879-97r.jpg 0
710 662 P56414-106r.jpg 1
711 586 NP21559-72r.jpg 0
712 162 P268-2r.jpg 1
713 165 P56585-112r.jpg 1

714 rows × 3 columns

In [4]:
# Columns in the dataframe
print("Columns initially in the dataframe: ", list(pollen.columns))

pollen = pollen.drop(columns = pollen.columns[0])

# Columns in the dataframe now
print("Columns currently in the dataframe: ", list(pollen.columns))
Columns initially in the dataframe:  ['Unnamed: 0', 'filename', 'pollen_carrying']
Columns currently in the dataframe:  ['filename', 'pollen_carrying']
In [5]:
# Checking the spread among labels

pollen['pollen_carrying'].value_counts()
Out[5]:
1    369
0    345
Name: pollen_carrying, dtype: int64

It is a fairly balanced dataset.

In [6]:
# Labels

pollen_classes = ["Pollen", "Non Pollen"]

Visualization

In [7]:
plt.figure(figsize=(10, 10))

for i in range(9):
    x = random.randint(0, len(pollen)-1)    # pick random sample
    ax = plt.subplot(3, 3, i + 1)
    row = pollen.loc[x] 
    image = Image.open('PollenDataset/images/' + row['filename'])
    plt.imshow(image)
    plt.title(pollen_classes[row['pollen_carrying']])
    plt.axis("off")

Preprocessing

In [8]:
def buildX(df, rootdir = None):
    X = []    # initialising X array
    for i in range(len(pollen)):    # loop through dataset
        row = df.loc[i]
        fname, label = row['filename'], row['pollen_carrying']
        ik = rootdir + fname
        img = Image.open(ik)
        img = np.asarray(img)
        X.append(img/255)    # normalize image and append to X

    return X
In [9]:
# building X array from images
X = buildX(pollen, 'PollenDataset/images/')
y = pollen['pollen_carrying']

# as numpy array
X = np.array(X)
y = np.array(y)

# printing some inference
print("Shape of one image: ", X[0].shape)
print('The shape of X: ', X.shape)  
print('The shape of y:', y.shape)
Shape of one image:  (300, 180, 3)
The shape of X:  (714, 300, 180, 3)
The shape of y: (714,)
In [10]:
# Creating train-test using an 80-20 split
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.20, random_state = 13)

print('Number of samples in:')
print("Training set: ", xtrain.shape[0])
print("Test set: ", xtest.shape[0])

print("\nLets see the spread of labels - ")
print("Training set - \t 1: ", list(ytrain).count(1), "\t0: ", list(ytrain).count(0))
print("Test set - \t 1: ", list(ytest).count(1), "\t0: ", list(ytest).count(0))
Number of samples in:
Training set:  571
Test set:  143

Lets see the spread of labels - 
Training set - 	 1:  298 	0:  273
Test set - 	 1:  71 	0:  72
In [11]:
model = models.Sequential([
    Conv2D(64,(3,3), activation='relu', input_shape=xtrain[0].shape),
    MaxPooling2D(2,2),
    Conv2D(64,(3,3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(128,(3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128,(3,3), activation='relu'),
    MaxPooling2D(2, 2),

    Conv2D(256,(3,3), activation='relu'),
    MaxPooling2D(2,2),
    
    Flatten(),
    Dropout(0.4),

    Dense(256, activation = 'relu'),
    Dense(1, activation = 'sigmoid')
    ])

model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(0.001), metrics=['accuracy'])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 298, 178, 64)      1792      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 149, 89, 64)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 147, 87, 64)       36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 43, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 71, 41, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 20, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 33, 18, 128)       147584    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 9, 128)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 7, 256)        295168    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 3, 256)         0         
_________________________________________________________________
flatten (Flatten)            (None, 5376)              0         
_________________________________________________________________
dropout (Dropout)            (None, 5376)              0         
_________________________________________________________________
dense (Dense)                (None, 256)               1376512   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 257       
=================================================================
Total params: 1,932,097
Trainable params: 1,932,097
Non-trainable params: 0
_________________________________________________________________
In [12]:
history = model.fit(xtrain, ytrain, epochs=16, validation_data = (xtest, ytest), verbose = 1)
Epoch 1/16
 2/18 [==>...........................] - ETA: 0s - loss: 0.6949 - accuracy: 0.4375WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0232s vs `on_train_batch_end` time: 0.0446s). Check your callbacks.
18/18 [==============================] - 2s 129ms/step - loss: 0.6996 - accuracy: 0.4939 - val_loss: 0.6906 - val_accuracy: 0.6154
Epoch 2/16
18/18 [==============================] - 1s 71ms/step - loss: 0.6281 - accuracy: 0.6725 - val_loss: 0.6649 - val_accuracy: 0.6294
Epoch 3/16
18/18 [==============================] - 1s 71ms/step - loss: 0.5982 - accuracy: 0.7093 - val_loss: 0.5056 - val_accuracy: 0.7413
Epoch 4/16
18/18 [==============================] - 1s 71ms/step - loss: 0.4575 - accuracy: 0.7811 - val_loss: 0.3893 - val_accuracy: 0.8182
Epoch 5/16
18/18 [==============================] - 1s 71ms/step - loss: 0.3362 - accuracy: 0.8704 - val_loss: 0.2963 - val_accuracy: 0.8322
Epoch 6/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2438 - accuracy: 0.9002 - val_loss: 0.3174 - val_accuracy: 0.8531
Epoch 7/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2337 - accuracy: 0.9089 - val_loss: 0.2500 - val_accuracy: 0.8951
Epoch 8/16
18/18 [==============================] - 1s 71ms/step - loss: 0.2087 - accuracy: 0.9229 - val_loss: 0.3014 - val_accuracy: 0.8671
Epoch 9/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1650 - accuracy: 0.9440 - val_loss: 0.2356 - val_accuracy: 0.8951
Epoch 10/16
18/18 [==============================] - 1s 71ms/step - loss: 0.1597 - accuracy: 0.9475 - val_loss: 0.2379 - val_accuracy: 0.8951
Epoch 11/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1539 - accuracy: 0.9405 - val_loss: 0.2077 - val_accuracy: 0.9161
Epoch 12/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1037 - accuracy: 0.9580 - val_loss: 0.1476 - val_accuracy: 0.9371
Epoch 13/16
18/18 [==============================] - 1s 71ms/step - loss: 0.0925 - accuracy: 0.9702 - val_loss: 0.2141 - val_accuracy: 0.9091
Epoch 14/16
18/18 [==============================] - 1s 72ms/step - loss: 0.1355 - accuracy: 0.9510 - val_loss: 0.2172 - val_accuracy: 0.9301
Epoch 15/16
18/18 [==============================] - 1s 72ms/step - loss: 0.0867 - accuracy: 0.9720 - val_loss: 0.2226 - val_accuracy: 0.9301
Epoch 16/16
18/18 [==============================] - 1s 71ms/step - loss: 0.0757 - accuracy: 0.9737 - val_loss: 0.1594 - val_accuracy: 0.9231
In [13]:
model.evaluate(xtest, ytest)
5/5 [==============================] - 0s 15ms/step - loss: 0.1594 - accuracy: 0.9231
Out[13]:
[0.15937379002571106, 0.9230769276618958]

Plotting the metrics

In [14]:
def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.legend([variable, variable2])
    plt.title(variable)
In [15]:
plot(history.history, "accuracy", "val_accuracy")
In [16]:
plot(history.history, "loss", "val_loss")

Prediction

In [17]:
i = random.randint(0, len(xtest)-1)

output = model(np.expand_dims(xtest[i], 0))

pred = output.numpy()[0][0]

plt.imshow(xtest[0])
plt.axis('off')

print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(pred>0.5)], " (", pred, " --> ", int(pred>0.5), ")")
Actual label:  Non Pollen
Model prediction :  Non Pollen  ( 1.0  -->  1 )

deepC

In [18]:
model.save('honey_bee_pollen.h5')

!deepCC honey_bee_pollen.h5
[INFO]
Reading [keras model] 'honey_bee_pollen.h5'
[SUCCESS]
Saved 'honey_bee_pollen_deepC/honey_bee_pollen.onnx'
[INFO]
Reading [onnx model] 'honey_bee_pollen_deepC/honey_bee_pollen.onnx'
[INFO]
Model info:
  ir_vesion : 5
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) conv2d_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'honey_bee_pollen_deepC/honey_bee_pollen.cpp'
[INFO]
deepSea model files are ready in 'honey_bee_pollen_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "honey_bee_pollen_deepC/honey_bee_pollen.cpp" -D_AITS_MAIN -o "honey_bee_pollen_deepC/honey_bee_pollen.exe"
[RUNNING COMMAND]
size "honey_bee_pollen_deepC/honey_bee_pollen.exe"
   text	   data	    bss	    dec	    hex	filename
7912125	   3760	    760	7916645	 78cc65	honey_bee_pollen_deepC/honey_bee_pollen.exe
[SUCCESS]
Saved model as executable "honey_bee_pollen_deepC/honey_bee_pollen.exe"
In [ ]:
i = random.randint(0, len(xtest)-1)

np.savetxt('sample.data', (xtest[i]).flatten())  
    
!honey_bee_pollen_deepC/honey_bee_pollen.exe sample.data

nn_out = np.loadtxt('deepSea_result_1.out')

plt.imshow(xtest[i])
plt.axis('off')

print()
print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(nn_out>0.5)], " (", nn_out, " --> ", int(nn_out>0.5), ")")
Warn: conv2d_Relu_0_pooling: auto_pad attribute is deprecated, it'll be ignored.
In [ ]: