pH-recognition¶
Credit: AITS Cainvas Community
Photo by Chris Gannon on Dribbble
Litmus paper is used to test acidity or basic nature of a given solution. Once dipped, the paper turns red, blue or any shade in between depending on the nature of the solution. Red stands for acidic nature and blue stands for basic nature!
Here we train a model to recognize the pH value based on RGB values of the color of the litmus paper.
In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras import models, optimizers, losses, layers, callbacks
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import random
In [2]:
df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/ph-data.csv')
df
Out[2]:
In [3]:
df['label'].value_counts()
Out[3]:
This is a balanced dataset.
Normalization¶
Pixel values in the range 0-255 are scaled down to the range 0-1 for faster convergence.
In [4]:
df[['red','green','blue']] /= 255
Train - val - test split¶
Defining the input and output columns.
In [5]:
# defining the input and output columns to separate the dataset in the later cells.
input_columns = df.columns.tolist()
input_columns.remove('label')
output_columns = ['label']
print("Number of input columns: ", len(input_columns))
#print("Input columns: ", ', '.join(input_columns))
print("Number of output columns: ", len(output_columns))
#print("Output columns: ", ', '.join(output_columns))
In [6]:
# Splitting into train, val and test set -- 80-10-10 split
# First, an 80-20 split
train_df, val_test_df = train_test_split(df, test_size = 0.2)
# Then split the 20% into half
val_df, test_df = train_test_split(val_test_df, test_size = 0.5)
print("Number of samples in...")
print("Training set: ", len(train_df))
print("Validation set: ", len(val_df))
print("Testing set: ", len(test_df))
In [7]:
# Splitting into X (input) and y (output)
Xtrain, ytrain = np.array(train_df[input_columns]), np.array(train_df[output_columns])
Xval, yval = np.array(val_df[input_columns]), np.array(val_df[output_columns])
Xtest, ytest = np.array(test_df[input_columns]), np.array(test_df[output_columns])
Model¶
In [8]:
model = models.Sequential([
layers.Dense(128, activation = 'relu', input_shape = Xtrain[0].shape),
layers.Dense(64, activation = 'relu'),
#layers.Dense(16, activation = 'relu'),
layers.Dense(8, activation = 'relu'),
layers.Dense(1)
])
cb = callbacks.EarlyStopping(patience = 10, restore_best_weights = True)
In [9]:
model.summary()
In [10]:
model.compile(optimizer = optimizers.Adam(0.001), loss = losses.MeanSquaredError(), metrics = ['mae'])
history = model.fit(Xtrain, ytrain, validation_data = (Xval, yval), epochs = 1024, callbacks = cb)
In [11]:
model.evaluate(Xtest, ytest)
Out[11]:
Prediction¶
In [12]:
pd.DataFrame({'True': ytest.reshape(-1)[:10],
'Predicted':model.predict(Xtest).reshape(-1)[:10]})
Out[12]:
Plotting the metrics¶
In [13]:
def plot(history, variable, variable2):
plt.plot(range(len(history[variable])), history[variable])
plt.plot(range(len(history[variable2])), history[variable2])
plt.legend([variable, variable2])
plt.title(variable)
In [14]:
plot(history.history, "loss", "val_loss")
In [15]:
plot(history.history, "mae", "val_mae")
deepC¶
In [16]:
model.save('pH.h5')
!deepCC pH.h5