Cainvas

Fruits Classification

In this notebook, we will classify a fruit (among 33 types) based on the image of the fruit.

Import all the required libraries

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import wget
import os

from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report, log_loss, accuracy_score
from sklearn.model_selection import train_test_split
In [2]:
!pip install wget
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: wget in ./.local/lib/python3.7/site-packages (3.2)
WARNING: You are using pip version 20.3.1; however, version 21.1.3 is available.
You should consider upgrading via the '/opt/tljh/user/bin/python -m pip install --upgrade pip' command.
In [3]:
directory = "fruits/train/train"

unzip the given dataset containing images of fruits

In [4]:
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/AmrutaKoshe/fruits.zip"
!unzip -qo fruits.zip
--2021-07-07 18:09:18--  https://cainvas-static.s3.amazonaws.com/media/user_data/AmrutaKoshe/fruits.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.62.64
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.62.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 105408057 (101M) [application/x-zip-compressed]
Saving to: ‘fruits.zip’

fruits.zip          100%[===================>] 100.52M  84.8MB/s    in 1.2s    

2021-07-07 18:09:19 (84.8 MB/s) - ‘fruits.zip’ saved [105408057/105408057]

Extract all the possible classifications of fruits from the file names in our dataset.

In [5]:
Name=[]
for file in os.listdir(directory):
    Name+=[file]
print(Name)
print(len(Name))
['Pepper Green', 'Lemon', 'Cantaloupe', 'Passion Fruit', 'Pineapple', 'Apricot', 'Banana', 'Pomegranate', 'Pear', 'Avocado', 'Potato Red', 'Plum', 'Cucumber Ripe', 'Strawberry', 'Cactus fruit', 'Raspberry', 'Tomato', 'Pepper Red', 'Peach', 'Blueberry', 'Onion White', 'Orange', 'Watermelon', 'Kiwi', 'Limes', 'Apple Granny Smith', 'Apple Braeburn', 'Cherry', 'Grape Blue', 'Corn', 'Mango', 'Clementine', 'Papaya']
33

Map the classifications i.e. classes to an integer.

In [6]:
fruit_map = dict(zip(Name, [t for t in range(len(Name))]))
print(fruit_map)
r_fruit_map=dict(zip([t for t in range(len(Name))],Name)) 
{'Pepper Green': 0, 'Lemon': 1, 'Cantaloupe': 2, 'Passion Fruit': 3, 'Pineapple': 4, 'Apricot': 5, 'Banana': 6, 'Pomegranate': 7, 'Pear': 8, 'Avocado': 9, 'Potato Red': 10, 'Plum': 11, 'Cucumber Ripe': 12, 'Strawberry': 13, 'Cactus fruit': 14, 'Raspberry': 15, 'Tomato': 16, 'Pepper Red': 17, 'Peach': 18, 'Blueberry': 19, 'Onion White': 20, 'Orange': 21, 'Watermelon': 22, 'Kiwi': 23, 'Limes': 24, 'Apple Granny Smith': 25, 'Apple Braeburn': 26, 'Cherry': 27, 'Grape Blue': 28, 'Corn': 29, 'Mango': 30, 'Clementine': 31, 'Papaya': 32}
In [7]:
len(fruit_map)
Out[7]:
33
In [8]:
def mapper(value):
    return r_fruit_map[value]

Perform data augmentation by using ImageDataGenerator so that we can acquire more relevant data from the existing images by making minor alterations to the dataset.

In [9]:
img_datagen = ImageDataGenerator(rescale=1./255,
                                vertical_flip=True,
                                horizontal_flip=True,
                                rotation_range=40,
                                width_shift_range=0.2,
                                height_shift_range=0.2,
                                zoom_range=0.1,
                                validation_split=0.2)
In [10]:
test_datagen = ImageDataGenerator(rescale=1./255)

Divide the training dataset into train set and validation set.

In [11]:
train_generator = img_datagen.flow_from_directory(directory,
                                                 shuffle=True,
                                                 batch_size=32,
                                                 subset='training',
                                                 target_size=(100, 100))
Found 13309 images belonging to 33 classes.
In [12]:
valid_generator = img_datagen.flow_from_directory(directory,
                                                 shuffle=True,
                                                 batch_size=16,
                                                 subset='validation',
                                                 target_size=(100, 100))
Found 3314 images belonging to 33 classes.
In [13]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Conv2D,MaxPooling2D,Dropout,Flatten,Activation,BatchNormalization
from tensorflow.keras.models import model_from_json
from tensorflow.keras.models import load_model

Train a sequential model.

In [14]:
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=(100,100,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding = 'same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(len(fruit_map)))
model.add(Activation('softmax'))

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 100, 100, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 50, 50, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 50, 50, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 25, 25, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 25, 25, 64)        36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 12, 12, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 6, 6, 64)          36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 1, 1, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 64)                0         
_________________________________________________________________
dense (Dense)                (None, 256)               16640     
_________________________________________________________________
activation (Activation)      (None, 256)               0         
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 33)                8481      
_________________________________________________________________
activation_1 (Activation)    (None, 33)                0         
=================================================================
Total params: 192,225
Trainable params: 192,225
Non-trainable params: 0
_________________________________________________________________
In [15]:
model.compile(optimizer='adam',
             loss='categorical_crossentropy',
             metrics=['accuracy'])
In [16]:
history = model.fit(train_generator, validation_data=valid_generator,
                   steps_per_epoch=train_generator.n//train_generator.batch_size,
                   validation_steps=valid_generator.n//valid_generator.batch_size,
                   epochs=10)
Epoch 1/10
415/415 [==============================] - 43s 103ms/step - loss: 2.1697 - accuracy: 0.3067 - val_loss: 0.8547 - val_accuracy: 0.6739
Epoch 2/10
415/415 [==============================] - 42s 102ms/step - loss: 0.7189 - accuracy: 0.7288 - val_loss: 0.3619 - val_accuracy: 0.8514
Epoch 3/10
415/415 [==============================] - 43s 103ms/step - loss: 0.3826 - accuracy: 0.8639 - val_loss: 0.2193 - val_accuracy: 0.9215
Epoch 4/10
415/415 [==============================] - 43s 102ms/step - loss: 0.2220 - accuracy: 0.9222 - val_loss: 0.1437 - val_accuracy: 0.9478
Epoch 5/10
415/415 [==============================] - 43s 104ms/step - loss: 0.2096 - accuracy: 0.9319 - val_loss: 0.0678 - val_accuracy: 0.9789
Epoch 6/10
415/415 [==============================] - 43s 103ms/step - loss: 0.1355 - accuracy: 0.9560 - val_loss: 0.0984 - val_accuracy: 0.9635
Epoch 7/10
415/415 [==============================] - 44s 106ms/step - loss: 0.1193 - accuracy: 0.9596 - val_loss: 0.0309 - val_accuracy: 0.9900
Epoch 8/10
415/415 [==============================] - 43s 103ms/step - loss: 0.1169 - accuracy: 0.9626 - val_loss: 0.0714 - val_accuracy: 0.9740
Epoch 9/10
415/415 [==============================] - 43s 104ms/step - loss: 0.1030 - accuracy: 0.9677 - val_loss: 0.0205 - val_accuracy: 0.9934
Epoch 10/10
415/415 [==============================] - 42s 102ms/step - loss: 0.0963 - accuracy: 0.9704 - val_loss: 0.0706 - val_accuracy: 0.9749

Plot the loss and accuracy curves to understand the performance of our model.

In [17]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and validation loss')
plt.show()

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and validation accuracy')
plt.show()

Making Predictions

In [21]:
load_img("fruits/test/test/0120.jpg",target_size=(180,180))
Out[21]:

Randomly select an image from the test set and feed it to our model to make predictions.

In [22]:
image=load_img("fruits/test/test/0030.jpg",target_size=(100,100))

image=img_to_array(image) 
image=image/255.0
prediction_image=np.array(image)
prediction_image= np.expand_dims(image, axis=0)
In [23]:
prediction=model.predict(prediction_image)
value=np.argmax(prediction)
move_name=mapper(value)
print("Prediction is {}.".format(move_name))
Prediction is Pineapple.
In [ ]: