Mushroom Classification Using Deep Learning¶
Credit: AITS Cainvas Community
Photo by Marianna Che on Dribbble
In this project, we will examine the data and build a deep neural network that will detect if the mushroom is edible or poisonous by its specifications like cap shape, cap color, gill color, etc.¶
!wget https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv
Importing the python libraries and packages¶
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D
from tensorflow.keras.optimizers import Adam
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import metrics
Reading the CSV file of the dataset¶
Pandas read_csv() function imports a CSV file (in our case, ‘mushrooms.csv’) to DataFrame format.
df = pd.read_csv("mushrooms.csv")
Examining the Data¶
After importing the data, to learn more about the dataset, we’ll use .head() .info() and .describe() methods.
df.head()
df.info()
df.describe()
The shape of the dataset¶
print("Dataset shape:", df.shape)
This shows that our dataset contains 8124 rows i.e. instances of mushrooms and 23 columns i.e. the specifications like cap-shape, cap-surface, cap-color, bruises, odor, gill-size, etc.
Unique occurrences of ‘class’ column¶
The .unique() method will give you the unique occurrences in the ‘class’ column of the dataset.
df['class'].unique()
‘p’ -> poisonous and ‘e’ -> edible
Count of the unique occurrences of ‘class’ column¶
The .value_counts() method will give you the count of the unique occurrences.
df['class'].value_counts()
Now let’s visualize the count of edible and poisonous mushrooms using Seaborn¶
count = df['class'].value_counts()
plt.figure(figsize=(8,7))
sns.barplot(count.index, count.values, alpha=0.8, palette="prism")
plt.ylabel('Count', fontsize=12)
plt.xlabel('Class', fontsize=12)
plt.title('Number of poisonous/edible mushrooms')
#plt.savefig("mushrooms1.png", format='png', dpi=500)
plt.show()
Data Manipulation¶
Use one hot encoding to make the categorical data to numerical data
undummy_X = df.iloc[:,1:23]
undummy_y = df.iloc[:, 0]
X = pd.get_dummies(undummy_X)
y = pd.get_dummies(undummy_y)
Preparing the Data¶
Setting X and y and splitting the data into train and test respectively.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
X_test.shape
Building the model¶
We create a Sequential model and add layers one at a time until we are happy with our network architecture.
classifier=Sequential()
classifier.add(Dense(64,activation='relu',input_dim=117))
classifier.add(Dropout(0.4))
classifier.add(Dense(32,activation='relu'))
classifier.add(Dropout(0.3))
classifier.add(Dense(2,activation='softmax'))
Compile Keras Model¶
Now that the model is defined, we can compile it.
classifier.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['accuracy'])
Model Summary¶
classifier.summary()
Fit Keras Model¶
We have defined our model and compiled it ready for efficient computation.
history = classifier.fit(X_train, y_train, epochs=15, validation_data=(X_test, y_test), verbose=1)
Evaluate Keras Model¶
The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset.
loss, accuracy = classifier.evaluate(X_test, y_test)
print('Accuracy: %.2f' % (accuracy*100))
print('Loss: %.2f' % (loss*100))
def plot_learningCurve(history, epoch):
# Plot training & validation accuracy values
epoch_range = range(1, epoch+1)
plt.plot(epoch_range, history.history['accuracy'])
plt.plot(epoch_range, history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(epoch_range, history.history['loss'])
plt.plot(epoch_range, history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
Plotting the curves using the function defined above¶
plot_learningCurve(history, 15)
Making predictions on some values¶
y_pred=classifier.predict(X_test)
y_pred=y_pred>0.5
y_pred_int = y_pred.astype(int)
y_pred_int[:10]
Now, let's save the model¶
#saving the model
classifier.save('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/mushrooms.csv')
DeepCC¶
!deepCC