Abalone age prediction app¶

Abalone is a common name for sea snails. Determining their age is a detailed process. Their shell is cut through the cone, stained and the rings are counted using a microscope.

Here, we use measurements such as length, height, weight and other features to predict their age.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras import models, optimizers, losses, layers, callbacks
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import random

The dataset¶

Data comes from an original (non-machine-learning) study: Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn and Wes B Ford (1994) "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait", Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288)

UCI Machine Learning Repository

The dataset is a CSV file containing features of 4177 samples.

df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/abalone.csv')
df

Preprocessing¶

Encoding the input columns¶

# One hot encoding the sex attribute.
df_dummies = pd.get_dummies(df['Sex'], drop_first = True, prefix = "Sex_")

# Inserting dummy columns
for column in df_dummies.columns:
    df[column] = df_dummies[column]
    
# Dropping the original column
df = df.drop(columns = ['Sex'])

df

Encoding the output columns¶

def rings_label(x):
    if x<=10:
        return 'young'
    if x<=20:
        return 'middle age'
    if x<=30:
        return 'old'
    
df['Rings'] = df['Rings'].apply(rings_label)

df['Rings'].value_counts()

young         2730
middle age    1411
old             36
Name: Rings, dtype: int64

# One hot encoding the sex attribute.
df_dummies = pd.get_dummies(df['Rings'])

# Inserting dummy columns
for column in df_dummies.columns:
    df[column] = df_dummies[column]
    
# Dropping the original column
df = df.drop(columns = ['Rings'])

df

Defining the input and output columns¶

# defining the input and output columns to separate the dataset in the later cells.

input_columns = df.columns.tolist()
input_columns.remove('young')
input_columns.remove('middle age')
input_columns.remove('old')

output_columns = ['young', 'middle age', 'old']

print("Number of input columns: ", len(input_columns))
#print("Input columns: ", ', '.join(input_columns))

print("Number of output columns: ", len(output_columns))
#print("Output columns: ", ', '.join(output_columns))

Number of input columns:  9
Number of output columns:  3

Train validation test split¶

# Splitting into train, val and test set -- 80-10-10 split

# First, an 80-20 split
train_df, val_test_df = train_test_split(df, test_size = 0.2)

# Then split the 20% into half
val_df, test_df = train_test_split(val_test_df, test_size = 0.5)

print("Number of samples in...")
print("Training set: ", len(train_df))
print("Validation set: ", len(val_df))
print("Testing set: ", len(test_df))

Number of samples in...
Training set:  3341
Validation set:  418
Testing set:  418

# Splitting into X (input) and y (output)

Xtrain, ytrain = np.array(train_df[input_columns]), np.array(train_df[output_columns])

Xval, yval = np.array(val_df[input_columns]), np.array(val_df[output_columns])

Xtest, ytest = np.array(test_df[input_columns]), np.array(test_df[output_columns])

Standardization¶

ss = StandardScaler()

Xtrain = ss.fit_transform(Xtrain)
Xval = ss.transform(Xval)
Xtest = ss.transform(Xtest)

The model¶

model = models.Sequential([
    layers.Dense(32, activation = 'relu', input_shape = Xtrain[0].shape),
    layers.Dense(8, activation = 'relu'),
    layers.Dense(3, activation = 'softmax')
])

cb = callbacks.EarlyStopping(patience = 5, restore_best_weights = True)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 32)                320       
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 264       
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 27        
=================================================================
Total params: 611
Trainable params: 611
Non-trainable params: 0
_________________________________________________________________

model.compile(optimizer = optimizers.Adam(0.001), loss = losses.CategoricalCrossentropy(), metrics = ['accuracy'])

history = model.fit(Xtrain, ytrain, validation_data = (Xval, yval), epochs = 256, callbacks = cb)

Epoch 1/256
105/105 [==============================] - 0s 3ms/step - loss: 0.8249 - accuracy: 0.6396 - val_loss: 0.5985 - val_accuracy: 0.7440
Epoch 2/256
105/105 [==============================] - 0s 2ms/step - loss: 0.5828 - accuracy: 0.7312 - val_loss: 0.5492 - val_accuracy: 0.7536
Epoch 3/256
105/105 [==============================] - 0s 2ms/step - loss: 0.5475 - accuracy: 0.7486 - val_loss: 0.5341 - val_accuracy: 0.7512
Epoch 4/256
105/105 [==============================] - 0s 2ms/step - loss: 0.5284 - accuracy: 0.7632 - val_loss: 0.5174 - val_accuracy: 0.7512
Epoch 5/256
105/105 [==============================] - 0s 2ms/step - loss: 0.5108 - accuracy: 0.7707 - val_loss: 0.5112 - val_accuracy: 0.7560
Epoch 6/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4974 - accuracy: 0.7767 - val_loss: 0.5147 - val_accuracy: 0.7512
Epoch 7/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4936 - accuracy: 0.7770 - val_loss: 0.5042 - val_accuracy: 0.7584
Epoch 8/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4854 - accuracy: 0.7770 - val_loss: 0.5048 - val_accuracy: 0.7512
Epoch 9/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4791 - accuracy: 0.7812 - val_loss: 0.5021 - val_accuracy: 0.7440
Epoch 10/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4741 - accuracy: 0.7815 - val_loss: 0.5034 - val_accuracy: 0.7560
Epoch 11/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4728 - accuracy: 0.7857 - val_loss: 0.5026 - val_accuracy: 0.7416
Epoch 12/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4692 - accuracy: 0.7803 - val_loss: 0.4994 - val_accuracy: 0.7416
Epoch 13/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4678 - accuracy: 0.7854 - val_loss: 0.4981 - val_accuracy: 0.7488
Epoch 14/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4647 - accuracy: 0.7881 - val_loss: 0.4984 - val_accuracy: 0.7488
Epoch 15/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4649 - accuracy: 0.7884 - val_loss: 0.4984 - val_accuracy: 0.7584
Epoch 16/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4637 - accuracy: 0.7842 - val_loss: 0.4980 - val_accuracy: 0.7416
Epoch 17/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4628 - accuracy: 0.7863 - val_loss: 0.4957 - val_accuracy: 0.7368
Epoch 18/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4617 - accuracy: 0.7860 - val_loss: 0.4970 - val_accuracy: 0.7368
Epoch 19/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4603 - accuracy: 0.7848 - val_loss: 0.4969 - val_accuracy: 0.7440
Epoch 20/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4591 - accuracy: 0.7884 - val_loss: 0.4959 - val_accuracy: 0.7464
Epoch 21/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4586 - accuracy: 0.7872 - val_loss: 0.4969 - val_accuracy: 0.7416
Epoch 22/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4584 - accuracy: 0.7896 - val_loss: 0.4926 - val_accuracy: 0.7392
Epoch 23/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4564 - accuracy: 0.7848 - val_loss: 0.4970 - val_accuracy: 0.7512
Epoch 24/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4557 - accuracy: 0.7869 - val_loss: 0.4931 - val_accuracy: 0.7440
Epoch 25/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4565 - accuracy: 0.7842 - val_loss: 0.4932 - val_accuracy: 0.7488
Epoch 26/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4538 - accuracy: 0.7860 - val_loss: 0.4937 - val_accuracy: 0.7464
Epoch 27/256
105/105 [==============================] - 0s 2ms/step - loss: 0.4537 - accuracy: 0.7860 - val_loss: 0.4953 - val_accuracy: 0.7440

model.evaluate(Xtest, ytest)

14/14 [==============================] - 0s 1ms/step - loss: 0.4442 - accuracy: 0.8062

[0.4441804587841034, 0.8062201142311096]

cm = confusion_matrix(np.argmax(ytest, axis = 1), (np.argmax(model.predict(Xtest), axis = 1)))
cm = cm.astype('int') / cm.sum(axis=1)[:, np.newaxis]

fig = plt.figure(figsize = (10, 10))
ax = fig.add_subplot(111)

for i in range(cm.shape[1]):
    for j in range(cm.shape[0]):
        if cm[i,j] > 0.8:
            clr = "white"
        else:
            clr = "black"
        ax.text(j, i, format(cm[i, j], '.2f'), horizontalalignment="center", color=clr)

_ = ax.imshow(cm, cmap=plt.cm.Blues)
ax.set_xticks(range(3))
ax.set_yticks(range(3))
ax.set_xticklabels(output_columns, rotation = 90)
ax.set_yticklabels(output_columns)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

Other attributes such as weather patterns and location (hence food availability) can help in classifying them better.

Plotting the metrics¶

def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.legend([variable, variable2])
    plt.title(variable)

plot(history.history, "loss", "val_loss")

plot(history.history, "accuracy", "val_accuracy")

Prediction¶

# pick random test data sample from one batch
x = random.randint(0, len(Xtest) - 1)

output = model.predict(Xtest[x].reshape(1, -1))[0]
print("Predicted: ", output_columns[np.argmax(output)])   
print("Probability: ", output[np.argmax(output)])

print("True: ", output_columns[np.argmax(ytest[x])])

Predicted:  young
Probability:  0.51350796
True:  middle age

deepC¶

model.save('abalone.h5')

!deepCC abalone.h5

[INFO]
Reading [keras model] 'abalone.h5'
[SUCCESS]
Saved 'abalone_deepC/abalone.onnx'
[INFO]
Reading [onnx model] 'abalone_deepC/abalone.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) dense_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_2's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_2) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'abalone_deepC/abalone.cpp'
[INFO]
deepSea model files are ready in 'abalone_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "abalone_deepC/abalone.cpp" -D_AITS_MAIN -o "abalone_deepC/abalone.exe"
[RUNNING COMMAND]
size "abalone_deepC/abalone.exe"
   text	   data	    bss	    dec	    hex	filename
 125669	   2984	    760	 129413	  1f985	abalone_deepC/abalone.exe
[SUCCESS]
Saved model as executable "abalone_deepC/abalone.exe"

x = random.randint(0, len(Xtest) - 1)
print(x)
np.savetxt('sample.data', Xtest[x])    # xth sample into text file

# run exe with input
!abalone_deepC/abalone.exe sample.data

# show predicted output
nn_out = np.loadtxt('deepSea_result_1.out')
print(model.predict(Xtest[x].reshape(1, -1))[0])
print(nn_out)
#print(x, Xtest[x])
print("Predicted: ", output_columns[np.argmax(nn_out)])   
print("Probability: ", nn_out[np.argmax(nn_out)])
#print(x, Xtest[x])
print("True: ", output_columns[np.argmax(ytest[x])])

325
writing file deepSea_result_1.out.
[0.5041611  0.48768988 0.00814902]
[0.504161   0.48769    0.00814905]
Predicted:  young
Probability:  0.504161
True:  middle age

	Sex	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings
0	M	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.1500	15
1	M	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.0700	7
2	F	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.2100	9
3	M	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.1550	10
4	I	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.0550	7
...	...	...	...	...	...	...	...	...	...
4172	F	0.565	0.450	0.165	0.8870	0.3700	0.2390	0.2490	11
4173	M	0.590	0.440	0.135	0.9660	0.4390	0.2145	0.2605	10
4174	M	0.600	0.475	0.205	1.1760	0.5255	0.2875	0.3080	9
4175	F	0.625	0.485	0.150	1.0945	0.5310	0.2610	0.2960	10
4176	M	0.710	0.555	0.195	1.9485	0.9455	0.3765	0.4950	12

	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Rings	Sex__I	Sex__M
0	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.1500	15	0	1
1	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.0700	7	0	1
2	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.2100	9	0	0
3	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.1550	10	0	1
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.0550	7	1	0
...	...	...	...	...	...	...	...	...	...	...
4172	0.565	0.450	0.165	0.8870	0.3700	0.2390	0.2490	11	0	0
4173	0.590	0.440	0.135	0.9660	0.4390	0.2145	0.2605	10	0	1
4174	0.600	0.475	0.205	1.1760	0.5255	0.2875	0.3080	9	0	1
4175	0.625	0.485	0.150	1.0945	0.5310	0.2610	0.2960	10	0	0
4176	0.710	0.555	0.195	1.9485	0.9455	0.3765	0.4950	12	0	1

	Length	Diameter	Height	Whole weight	Shucked weight	Viscera weight	Shell weight	Sex__I	Sex__M	middle age	old	young
0	0.455	0.365	0.095	0.5140	0.2245	0.1010	0.1500	0	1	1	0	0
1	0.350	0.265	0.090	0.2255	0.0995	0.0485	0.0700	0	1	0	0	1
2	0.530	0.420	0.135	0.6770	0.2565	0.1415	0.2100	0	0	0	0	1
3	0.440	0.365	0.125	0.5160	0.2155	0.1140	0.1550	0	1	0	0	1
4	0.330	0.255	0.080	0.2050	0.0895	0.0395	0.0550	1	0	0	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...
4172	0.565	0.450	0.165	0.8870	0.3700	0.2390	0.2490	0	0	1	0	0
4173	0.590	0.440	0.135	0.9660	0.4390	0.2145	0.2605	0	1	0	0	1
4174	0.600	0.475	0.205	1.1760	0.5255	0.2875	0.3080	0	1	0	0	1
4175	0.625	0.485	0.150	1.0945	0.5310	0.2610	0.2960	0	0	0	0	1
4176	0.710	0.555	0.195	1.9485	0.9455	0.3765	0.4950	0	1	1	0	0

Model Files
abalone.h5 keras Model
deepSea Compiled Models
abalone.exe deepSea Ubuntu