Cainvas
Model Files
best_wine.h5
keras
Model
deepSea Compiled Models
best_wine.exe
deepSea
Ubuntu

Determining Wine Quality

Credit: AITS Cainvas Community

Photo by Xiedian on Dribbble

In [1]:
# Import all the necessary libraries

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import Sequential
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix

Unzip the Dataset

In [2]:
!wget 'https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/Wine_dataset.zip'

!unzip -qo Wine_dataset.zip
!rm Wine_dataset.zip
--2021-09-15 07:59:08--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/Wine_dataset.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.40
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26176 (26K) [application/x-zip-compressed]
Saving to: ‘Wine_dataset.zip.1’

Wine_dataset.zip.1  100%[===================>]  25.56K  --.-KB/s    in 0.001s  

2021-09-15 07:59:08 (30.0 MB/s) - ‘Wine_dataset.zip.1’ saved [26176/26176]

In [3]:
#Loading the data file using pandas library

data = pd.read_csv('winequality-red.csv', sep = ",")
data.head(10)
Out[3]:
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
5 7.4 0.66 0.00 1.8 0.075 13.0 40.0 0.9978 3.51 0.56 9.4 5
6 7.9 0.60 0.06 1.6 0.069 15.0 59.0 0.9964 3.30 0.46 9.4 5
7 7.3 0.65 0.00 1.2 0.065 15.0 21.0 0.9946 3.39 0.47 10.0 7
8 7.8 0.58 0.02 2.0 0.073 9.0 18.0 0.9968 3.36 0.57 9.5 7
9 7.5 0.50 0.36 6.1 0.071 17.0 102.0 0.9978 3.35 0.80 10.5 5

Checking for NULL values

In [4]:
data.isna().sum()
Out[4]:
fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

Data Visualization

In [5]:
# Checking for quality distribution in the dataset
sns.countplot(data = data, x = 'quality')
plt.title("Quality Distribution")
plt.xlabel("Quality Level")
plt.ylabel("Count")
Out[5]:
Text(0, 0.5, 'Count')

Since the quality is distribution is not ideal and several quality levels (5 & 6) being highly over represented in the data, let us pre-process this and make this data a two class problem with 1 class containing quality level { 3, 4, 5 } and the other class cotaining quality levels { 6, 7, 8 }

Class 0: { 3, 4, 5 } Class 1: { 6, 7, 8 }

In [6]:
# Creating a new quality- level column
data['quality_level'] = data['quality'].apply(lambda x: 1 if x > 5 else 0)
X = data.drop(columns=['quality', 'quality_level'], axis=1)
y = data['quality_level'].values
In [7]:
sns.countplot(data = data, x = 'quality_level')
plt.title("Quality Distribution")
plt.xlabel("Quality Level")
plt.ylabel("Count")
Out[7]:
Text(0, 0.5, 'Count')

After checking the graph above, we conclude that the data is evenly distributed now and the quality classification will be more accurate now.

Effect of alcohol on wine quality

In [8]:
# Effect of alcohol level on quality of wine
sns.lineplot(data = data, x = 'quality', y = 'alcohol')
Out[8]:
<AxesSubplot:xlabel='quality', ylabel='alcohol'>

Plotting Pair Plots

In [9]:
# Visualising the relationship between different columns of the data
sns.pairplot(data)
plt.show()