Mushroom Classification Using Deep Learning¶

In this project, we will examine the data and build a deep neural network that will detect if the mushroom is edible or poisonous by its specifications like cap shape, cap color, gill color, etc.¶

!wget https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv

--2021-07-13 10:50:44--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.96
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.96|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 374003 (365K) [text/csv]
Saving to: ‘mushrooms.csv.1’

mushrooms.csv.1     100%[===================>] 365.24K  --.-KB/s    in 0.008s  

2021-07-13 10:50:44 (42.2 MB/s) - ‘mushrooms.csv.1’ saved [374003/374003]

Importing the python libraries and packages¶

import numpy as np 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
import  seaborn as sns
import matplotlib.pyplot as plt
from sklearn import metrics

Reading the CSV file of the dataset¶

Pandas read_csv() function imports a CSV file (in our case, ‘mushrooms.csv’) to DataFrame format.

df = pd.read_csv("mushrooms.csv")

Examining the Data¶

After importing the data, to learn more about the dataset, we’ll use .head() .info() and .describe() methods.

df.head()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   class                     8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-root                8124 non-null   object
 12  stalk-surface-above-ring  8124 non-null   object
 13  stalk-surface-below-ring  8124 non-null   object
 14  stalk-color-above-ring    8124 non-null   object
 15  stalk-color-below-ring    8124 non-null   object
 16  veil-type                 8124 non-null   object
 17  veil-color                8124 non-null   object
 18  ring-number               8124 non-null   object
 19  ring-type                 8124 non-null   object
 20  spore-print-color         8124 non-null   object
 21  population                8124 non-null   object
 22  habitat                   8124 non-null   object
dtypes: object(23)
memory usage: 1.4+ MB

df.describe()

The shape of the dataset¶

print("Dataset shape:", df.shape)

Dataset shape: (8124, 23)

This shows that our dataset contains 8124 rows i.e. instances of mushrooms and 23 columns i.e. the specifications like cap-shape, cap-surface, cap-color, bruises, odor, gill-size, etc.

Unique occurrences of ‘class’ column¶

The .unique() method will give you the unique occurrences in the ‘class’ column of the dataset.

df['class'].unique()

array(['p', 'e'], dtype=object)

‘p’ -> poisonous and ‘e’ -> edible

Count of the unique occurrences of ‘class’ column¶

The .value_counts() method will give you the count of the unique occurrences.

df['class'].value_counts()

e    4208
p    3916
Name: class, dtype: int64

Now let’s visualize the count of edible and poisonous mushrooms using Seaborn¶

count = df['class'].value_counts()
plt.figure(figsize=(8,7))
sns.barplot(count.index, count.values, alpha=0.8, palette="prism")
plt.ylabel('Count', fontsize=12)
plt.xlabel('Class', fontsize=12)
plt.title('Number of poisonous/edible mushrooms')
#plt.savefig("mushrooms1.png", format='png', dpi=500)
plt.show()

/opt/tljh/user/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
  FutureWarning

Data Manipulation¶

Use one hot encoding to make the categorical data to numerical data

undummy_X = df.iloc[:,1:23]
undummy_y = df.iloc[:, 0]
X = pd.get_dummies(undummy_X)
y = pd.get_dummies(undummy_y)

Preparing the Data¶

Setting X and y and splitting the data into train and test respectively.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
X_test.shape

(1625, 117)

Building the model¶

We create a Sequential model and add layers one at a time until we are happy with our network architecture.

classifier=Sequential()
classifier.add(Dense(64,activation='relu',input_dim=117))
classifier.add(Dropout(0.4))
classifier.add(Dense(32,activation='relu'))
classifier.add(Dropout(0.3))
classifier.add(Dense(2,activation='softmax'))

Compile Keras Model¶

Now that the model is defined, we can compile it.

classifier.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['accuracy'])

Model Summary¶

classifier.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                7552      
_________________________________________________________________
dropout (Dropout)            (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 66        
=================================================================
Total params: 9,698
Trainable params: 9,698
Non-trainable params: 0
_________________________________________________________________

Fit Keras Model¶

We have defined our model and compiled it ready for efficient computation.

history = classifier.fit(X_train, y_train, epochs=15, validation_data=(X_test, y_test), verbose=1)

Epoch 1/15
204/204 [==============================] - 0s 2ms/step - loss: 0.4753 - accuracy: 0.7766 - val_loss: 0.2643 - val_accuracy: 0.8868
Epoch 2/15
204/204 [==============================] - 0s 2ms/step - loss: 0.2673 - accuracy: 0.8958 - val_loss: 0.1515 - val_accuracy: 0.9317
Epoch 3/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1824 - accuracy: 0.9317 - val_loss: 0.0882 - val_accuracy: 0.9742
Epoch 4/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1382 - accuracy: 0.9514 - val_loss: 0.0591 - val_accuracy: 0.9871
Epoch 5/15
204/204 [==============================] - 0s 2ms/step - loss: 0.1040 - accuracy: 0.9631 - val_loss: 0.0426 - val_accuracy: 0.9895
Epoch 6/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0866 - accuracy: 0.9720 - val_loss: 0.0318 - val_accuracy: 0.9957
Epoch 7/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0751 - accuracy: 0.9737 - val_loss: 0.0255 - val_accuracy: 0.9963
Epoch 8/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0591 - accuracy: 0.9802 - val_loss: 0.0203 - val_accuracy: 0.9975
Epoch 9/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0518 - accuracy: 0.9855 - val_loss: 0.0169 - val_accuracy: 0.9975
Epoch 10/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0490 - accuracy: 0.9837 - val_loss: 0.0143 - val_accuracy: 0.9975
Epoch 11/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0376 - accuracy: 0.9885 - val_loss: 0.0124 - val_accuracy: 0.9982
Epoch 12/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0372 - accuracy: 0.9880 - val_loss: 0.0105 - val_accuracy: 0.9988
Epoch 13/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0346 - accuracy: 0.9920 - val_loss: 0.0090 - val_accuracy: 0.9988
Epoch 14/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0294 - accuracy: 0.9932 - val_loss: 0.0080 - val_accuracy: 0.9988
Epoch 15/15
204/204 [==============================] - 0s 2ms/step - loss: 0.0247 - accuracy: 0.9934 - val_loss: 0.0070 - val_accuracy: 0.9988

Evaluate Keras Model¶

The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset.

loss, accuracy = classifier.evaluate(X_test, y_test)
print('Accuracy: %.2f' % (accuracy*100))
print('Loss: %.2f' % (loss*100))

51/51 [==============================] - 0s 945us/step - loss: 0.0070 - accuracy: 0.9988
Accuracy: 99.88
Loss: 0.70

def plot_learningCurve(history, epoch):
  # Plot training & validation accuracy values
  epoch_range = range(1, epoch+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.ylabel('Accuracy')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.ylabel('Loss')
  plt.xlabel('Epoch')
  plt.legend(['Train', 'Val'], loc='upper left')
  plt.show()

Plotting the curves using the function defined above¶

plot_learningCurve(history, 15)

Making predictions on some values¶

y_pred=classifier.predict(X_test)
y_pred=y_pred>0.5
y_pred_int = y_pred.astype(int)
y_pred_int[:10]

array([[1, 0],
       [0, 1],
       [0, 1],
       [1, 0],
       [0, 1],
       [1, 0],
       [1, 0],
       [1, 0],
       [0, 1],
       [0, 1]])

Now, let's save the model¶

#saving the model
classifier.save('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv')

DeepCC¶

!deepCC

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
count	8124	8124	8124	8124	8124	8124	8124	8124	8124	8124	...	8124	8124	8124	8124	8124	8124	8124	8124	8124	8124
unique	2	6	4	10	2	9	2	2	2	12	...	4	9	9	1	4	3	5	9	6	7
top	e	x	y	n	f	n	f	c	b	b	...	s	w	w	p	w	o	p	w	v	d
freq	4208	3656	3244	2284	4748	3528	7914	6812	5612	1728	...	4936	4464	4384	8124	7924	7488	3968	2388	4040	3148

Model Files
Mushroom_Classification.h5 keras Model
deepSea Compiled Models
Mushroom_Classification.exe deepSea Ubuntu

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g