Differentiate pollen carrying honey bees¶
Credit: AITS Cainvas Community
Photo by Thinkmojo on Dribbble
This notebook differentiates between images of honey bees carrying pollen and those that aren't.
These deep learning models can prove useful in bee farming for analysis/inference generation.
In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from skimage import io, transform
import pandas as pd
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras import models, optimizers
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Flatten
import random
Dataset¶
The dataset consists of high resolution images of individual bees on the ramp.
The dataset folder¶
The dataset folder (link below) has the following content:
- images folder - Contains 300x180 resolution images of bees of both categories. The image file names contain the categories - P (for pollen) or NP (for non pollen).
- pollen_data.csv - A .csv file containing the image names and corresponding labels.
In [2]:
!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/PollenDataset.zip
!unzip -qo PollenDataset.zip
!rm PollenDataset.zip
In [3]:
# Going through the csv file
pollen = pd.read_csv('PollenDataset/pollen_data.csv')
# shuffle
pollen = pollen.sample(frac=1, random_state=113).reset_index(drop=True)
pollen
Out[3]:
In [4]:
# Columns in the dataframe
print("Columns initially in the dataframe: ", list(pollen.columns))
pollen = pollen.drop(columns = pollen.columns[0])
# Columns in the dataframe now
print("Columns currently in the dataframe: ", list(pollen.columns))
In [5]:
# Checking the spread among labels
pollen['pollen_carrying'].value_counts()
Out[5]:
It is a fairly balanced dataset.
In [6]:
# Labels
pollen_classes = ["Pollen", "Non Pollen"]
Visualization¶
In [7]:
plt.figure(figsize=(10, 10))
for i in range(9):
x = random.randint(0, len(pollen)-1) # pick random sample
ax = plt.subplot(3, 3, i + 1)
row = pollen.loc[x]
image = Image.open('PollenDataset/images/' + row['filename'])
plt.imshow(image)
plt.title(pollen_classes[row['pollen_carrying']])
plt.axis("off")
Preprocessing¶
In [8]:
def buildX(df, rootdir = None):
X = [] # initialising X array
for i in range(len(pollen)): # loop through dataset
row = df.loc[i]
fname, label = row['filename'], row['pollen_carrying']
ik = rootdir + fname
img = Image.open(ik)
img = np.asarray(img)
X.append(img/255) # normalize image and append to X
return X
In [9]:
# building X array from images
X = buildX(pollen, 'PollenDataset/images/')
y = pollen['pollen_carrying']
# as numpy array
X = np.array(X)
y = np.array(y)
# printing some inference
print("Shape of one image: ", X[0].shape)
print('The shape of X: ', X.shape)
print('The shape of y:', y.shape)
In [10]:
# Creating train-test using an 80-20 split
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.20, random_state = 13)
print('Number of samples in:')
print("Training set: ", xtrain.shape[0])
print("Test set: ", xtest.shape[0])
print("\nLets see the spread of labels - ")
print("Training set - \t 1: ", list(ytrain).count(1), "\t0: ", list(ytrain).count(0))
print("Test set - \t 1: ", list(ytest).count(1), "\t0: ", list(ytest).count(0))
In [11]:
model = models.Sequential([
Conv2D(64,(3,3), activation='relu', input_shape=xtrain[0].shape),
MaxPooling2D(2,2),
Conv2D(64,(3,3), activation='relu'),
MaxPooling2D(2, 2),
Conv2D(128,(3,3), activation='relu'),
MaxPooling2D(2,2),
Conv2D(128,(3,3), activation='relu'),
MaxPooling2D(2, 2),
Conv2D(256,(3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dropout(0.4),
Dense(256, activation = 'relu'),
Dense(1, activation = 'sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(0.001), metrics=['accuracy'])
model.summary()
In [12]:
history = model.fit(xtrain, ytrain, epochs=16, validation_data = (xtest, ytest), verbose = 1)
In [13]:
model.evaluate(xtest, ytest)
Out[13]:
Plotting the metrics¶
In [14]:
def plot(history, variable, variable2):
plt.plot(range(len(history[variable])), history[variable])
plt.plot(range(len(history[variable2])), history[variable2])
plt.legend([variable, variable2])
plt.title(variable)
In [15]:
plot(history.history, "accuracy", "val_accuracy")
In [16]:
plot(history.history, "loss", "val_loss")
Prediction¶
In [17]:
i = random.randint(0, len(xtest)-1)
output = model(np.expand_dims(xtest[i], 0))
pred = output.numpy()[0][0]
plt.imshow(xtest[0])
plt.axis('off')
print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(pred>0.5)], " (", pred, " --> ", int(pred>0.5), ")")
deepC¶
In [18]:
model.save('honey_bee_pollen.h5')
!deepCC honey_bee_pollen.h5
In [ ]:
i = random.randint(0, len(xtest)-1)
np.savetxt('sample.data', (xtest[i]).flatten())
!honey_bee_pollen_deepC/honey_bee_pollen.exe sample.data
nn_out = np.loadtxt('deepSea_result_1.out')
plt.imshow(xtest[i])
plt.axis('off')
print()
print("Actual label: ", pollen_classes[ytest[i]])
print("Model prediction : ", pollen_classes[int(nn_out>0.5)], " (", nn_out, " --> ", int(nn_out>0.5), ")")
In [ ]: