Cainvas
Model Files
Mushroom_Classification.h5
keras
Model
deepSea Compiled Models
Mushroom_Classification.exe
deepSea
Ubuntu

Mushroom Classification Using Deep Learning

Credit: AITS Cainvas Community

Photo by Marianna Che on Dribbble

In this project, we will examine the data and build a deep neural network that will detect if the mushroom is edible or poisonous by its specifications like cap shape, cap color, gill color, etc.

In [1]:
!wget https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv
--2021-07-13 10:50:44--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.96
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 374003 (365K) [text/csv]
Saving to: ‘mushrooms.csv.1’

mushrooms.csv.1     100%[===================>] 365.24K  --.-KB/s    in 0.008s  

2021-07-13 10:50:44 (42.2 MB/s) - ‘mushrooms.csv.1’ saved [374003/374003]

Importing the python libraries and packages

In [2]:
import numpy as np 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
import  seaborn as sns
import matplotlib.pyplot as plt
from sklearn import metrics

Reading the CSV file of the dataset

Pandas read_csv() function imports a CSV file (in our case, ‘mushrooms.csv’) to DataFrame format.

In [3]:
df = pd.read_csv("mushrooms.csv")

Examining the Data

After importing the data, to learn more about the dataset, we’ll use .head() .info() and .describe() methods.

In [4]:
df.head()
Out[4]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 p x s n t p f c n k ... s w w p w o p k s u
1 e x s y t a f c b k ... s w w p w o p n n g
2 e b s w t l f c b n ... s w w p w o p n n m
3 p x y w t p f c n n ... s w w p w o p k s u
4 e x s g f n f w b k ... s w w p w o e n a g

5 rows × 23 columns

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   class                     8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-root                8124 non-null   object
 12  stalk-surface-above-ring  8124 non-null   object
 13  stalk-surface-below-ring  8124 non-null   object
 14  stalk-color-above-ring    8124 non-null   object
 15  stalk-color-below-ring    8124 non-null   object
 16  veil-type                 8124 non-null   object
 17  veil-color                8124 non-null   object
 18  ring-number               8124 non-null   object
 19  ring-type                 8124 non-null   object
 20  spore-print-color         8124 non-null   object
 21  population                8124 non-null   object
 22  habitat                   8124 non-null   object
dtypes: object(23)
memory usage: 1.4+ MB
In [6]:
df.describe()
Out[6]:
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
count 8124 8124 8124 8124 8124 8124 8124 8124 8124 8124 ... 8124 8124 8124 8124 8124 8124 8124 8124 8124 8124
unique 2 6 4 10 2 9 2 2 2 12 ... 4 9 9 1 4 3 5 9 6 7
top e x y n f n f c b b ... s w w p w o p w v d
freq 4208 3656 3244 2284 4748 3528 7914 6812 5612 1728 ... 4936 4464 4384 8124 7924 7488 3968 2388 4040 3148

4 rows × 23 columns

The shape of the dataset

In [7]:
print("Dataset shape:", df.shape)
Dataset shape: (8124, 23)

This shows that our dataset contains 8124 rows i.e. instances of mushrooms and 23 columns i.e. the specifications like cap-shape, cap-surface, cap-color, bruises, odor, gill-size, etc.

Unique occurrences of ‘class’ column

The .unique() method will give you the unique occurrences in the ‘class’ column of the dataset.

In [8]:
df['class'].unique()
Out[8]:
array(['p', 'e'], dtype=object)

‘p’ -> poisonous and ‘e’ -> edible

Count of the unique occurrences of ‘class’ column

The .value_counts() method will give you the count of the unique occurrences.

In [9]:
df['class'].value_counts()
Out[9]:
e    4208
p    3916
Name: class, dtype: int64

Now let’s visualize the count of edible and poisonous mushrooms using Seaborn

In [10]:
count = df['class'].value_counts()
plt.figure(figsize=(8,7))
sns.barplot(count.index, count.values, alpha=0.8, palette="prism")
plt.ylabel('Count', fontsize=12)
plt.xlabel('Class', fontsize=12)
plt.title('Number of poisonous/edible mushrooms')
#plt.savefig("mushrooms1.png", format='png', dpi=500)
plt.show()
/opt/tljh/user/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  FutureWarning

Data Manipulation

Use one hot encoding to make the categorical data to numerical data

In [11]:
undummy_X = df.iloc[:,1:23]
undummy_y = df.iloc[:, 0]
X = pd.get_dummies(undummy_X)
y = pd.get_dummies(undummy_y)

Preparing the Data

Setting X and y and splitting the data into train and test respectively.

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
X_test.shape
Out[12]:
(1625, 117)

Building the model

We create a Sequential model and add layers one at a time until we are happy with our network architecture.

In [13]:
classifier=Sequential()
classifier.add(Dense(64,activation='relu',input_dim=117))
classifier.add(Dropout(0.4))
classifier.add(Dense(32,activation='relu'))
classifier.add(Dropout(0.3))
classifier.add(Dense(2,activation='softmax'))

Compile Keras Model

Now that the model is defined, we can compile it.

In [14]:
classifier.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['accuracy'])

Model Summary

In [15]:
classifier.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                7552      
_________________________________________________________________
dropout (Dropout)            (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
=================================================================
Total params: 9,698
Trainable params: 9,698
Non-trainable params: 0
_________________________________________________________________

Fit Keras Model

We have defined our model and compiled it ready for efficient computation.

In [16]:
history = classifier.fit(X_train, y_train, epochs=15, validation_data=(X_test, y_test), verbose=1)
Epoch 1/15
204/204 [==============================] - 0s 2ms/step - loss: 0.4753 - accuracy: 0.7766 - val_loss: 0.2643 - val_accuracy: 0.8868
Epoch 2/15
204/204 [==============================] - 0s 2ms/step - loss: 0.2673 - accuracy: 0.8958 - val_loss: 0.1515 - val_accuracy: 0.9317
Epoch 3/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1824 - accuracy: 0.9317 - val_loss: 0.0882 - val_accuracy: 0.9742
Epoch 4/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1382 - accuracy: 0.9514 - val_loss: 0.0591 - val_accuracy: 0.9871
Epoch 5/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1040 - accuracy: 0.9631 - val_loss: 0.0426 - val_accuracy: 0.9895
Epoch 6/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0866 - accuracy: 0.9720 - val_loss: 0.0318 - val_accuracy: 0.9957
Epoch 7/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0751 - accuracy: 0.9737 - val_loss: 0.0255 - val_accuracy: 0.9963
Epoch 8/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0591 - accuracy: 0.9802 - val_loss: 0.0203 - val_accuracy: 0.9975
Epoch 9/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0518 - accuracy: 0.9855 - val_loss: 0.0169 - val_accuracy: 0.9975
Epoch 10/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0490 - accuracy: 0.9837 - val_loss: 0.0143 - val_accuracy: 0.9975
Epoch 11/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0376 - accuracy: 0.9885 - val_loss: 0.0124 - val_accuracy: 0.9982
Epoch 12/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0372 - accuracy: 0.9880 - val_loss: 0.0105 - val_accuracy: 0.9988
Epoch 13/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0346 - accuracy: 0.9920 - val_loss: 0.0090 - val_accuracy: 0.9988
Epoch 14/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0294 - accuracy: 0.9932 - val_loss: 0.0080 - val_accuracy: 0.9988
Epoch 15/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0247 - accuracy: 0.9934 - val_loss: 0.0070 - val_accuracy: 0.9988

Evaluate Keras Model

The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset.

In [17]:
loss, accuracy = classifier.evaluate(X_test, y_test)
print('Accuracy: %.2f' % (accuracy*100))
print('Loss: %.2f' % (loss*100))
51/51 [==============================] - 0s 945us/step - loss: 0.0070 - accuracy: 0.9988
Accuracy: 99.88
Loss: 0.70
In [18]:
def plot_learningCurve(history, epoch):
  # Plot training & validation accuracy values
  epoch_range = range(1, epoch+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

Plotting the curves using the function defined above

In [19]:
plot_learningCurve(history, 15)

Making predictions on some values

In [20]:
y_pred=classifier.predict(X_test)
y_pred=y_pred>0.5
y_pred_int = y_pred.astype(int)
y_pred_int[:10]
Out[20]:
array([[1, 0],
       [0, 1],
       [0, 1],
       [1, 0],
       [0, 1],
       [1, 0],
       [1, 0],
       [1, 0],
       [0, 1],
       [0, 1]])

Now, let's save the model

In [22]:
#saving the model
classifier.save('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv')

DeepCC

In [ ]:
!deepCC