Cainvas

Classification on Organic Compounds

Credit: AITS Cainvas Community

Photo by MaryArty on Dribbble

In [1]:
# get data file
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/musk.csv"
--2021-07-13 11:06:24--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/musk.csv
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.124
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.124|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4622724 (4.4M) [text/csv]
Saving to: ‘musk.csv’

musk.csv            100%[===================>]   4.41M  --.-KB/s    in 0.04s   

2021-07-13 11:06:24 (119 MB/s) - ‘musk.csv’ saved [4622724/4622724]

In [2]:
# Import the required libraries
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
In [3]:
# read the csv file
dataset = pd.read_csv('musk.csv')
dataset.head()
Out[3]:
ID molecule_name conformation_name f1 f2 f3 f4 f5 f6 f7 ... f158 f159 f160 f161 f162 f163 f164 f165 f166 class
0 1 MUSK-211 211_1+1 46 -108 -60 -69 -117 49 38 ... -308 52 -7 39 126 156 -50 -112 96 1
1 2 MUSK-211 211_1+10 41 -188 -145 22 -117 -6 57 ... -59 -2 52 103 136 169 -61 -136 79 1
2 3 MUSK-211 211_1+11 46 -194 -145 28 -117 73 57 ... -134 -154 57 143 142 165 -67 -145 39 1
3 4 MUSK-211 211_1+12 41 -188 -145 22 -117 -7 57 ... -60 -4 52 104 136 168 -60 -135 80 1
4 5 MUSK-211 211_1+13 41 -188 -145 22 -117 -7 57 ... -60 -4 52 104 137 168 -60 -135 80 1

5 rows × 170 columns

Data Preprocessing

In [4]:
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values
In [5]:
# Scaling
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(X)

Split the data for training and testing

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
In [7]:
X_train.shape,y_train.shape
Out[7]:
((4618, 166), (4618,))

Build and train the model

In [8]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(33, input_shape=(166,),
                          activation=tf.nn.tanh),    
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.summary()
# compile
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 33)                5511      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 34        
=================================================================
Total params: 5,545
Trainable params: 5,545
Non-trainable params: 0
_________________________________________________________________
In [9]:
history = model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 15)
Epoch 1/15
145/145 [==============================] - 0s 2ms/step - loss: 0.4147 - accuracy: 0.8367 - val_loss: 0.2794 - val_accuracy: 0.9146
Epoch 2/15
145/145 [==============================] - 0s 2ms/step - loss: 0.2333 - accuracy: 0.9290 - val_loss: 0.2037 - val_accuracy: 0.9293
Epoch 3/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1808 - accuracy: 0.9415 - val_loss: 0.1635 - val_accuracy: 0.9500
Epoch 4/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1522 - accuracy: 0.9511 - val_loss: 0.1428 - val_accuracy: 0.9545
Epoch 5/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1317 - accuracy: 0.9580 - val_loss: 0.1245 - val_accuracy: 0.9626
Epoch 6/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1162 - accuracy: 0.9643 - val_loss: 0.1204 - val_accuracy: 0.9581
Epoch 7/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1027 - accuracy: 0.9664 - val_loss: 0.1113 - val_accuracy: 0.9616
Epoch 8/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0929 - accuracy: 0.9701 - val_loss: 0.0969 - val_accuracy: 0.9677
Epoch 9/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0829 - accuracy: 0.9729 - val_loss: 0.0874 - val_accuracy: 0.9737
Epoch 10/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0766 - accuracy: 0.9729 - val_loss: 0.0821 - val_accuracy: 0.9747
Epoch 11/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0695 - accuracy: 0.9786 - val_loss: 0.0755 - val_accuracy: 0.9753
Epoch 12/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0627 - accuracy: 0.9816 - val_loss: 0.0700 - val_accuracy: 0.9788
Epoch 13/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0584 - accuracy: 0.9827 - val_loss: 0.0668 - val_accuracy: 0.9783
Epoch 14/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0511 - accuracy: 0.9864 - val_loss: 0.0618 - val_accuracy: 0.9803
Epoch 15/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0471 - accuracy: 0.9857 - val_loss: 0.0554 - val_accuracy: 0.9833
In [10]:
# Save the model
model.save("Simple ANN.h5")

Plots

In [11]:
# train_loss vs Val_loss
from matplotlib import pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [12]:
# train_accuracy vs val_accuracy
from matplotlib import pyplot as plt
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('acc')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

Predictions and Accuracy

In [13]:
# accuracy and loss
model.evaluate(X_test, y_test)
62/62 [==============================] - 0s 951us/step - loss: 0.0554 - accuracy: 0.9833
Out[13]:
[0.055376846343278885, 0.9833333492279053]
In [14]:
y_pred = model.predict(X_test)
y_pred[5:10]
Out[14]:
array([[8.6293503e-04],
       [4.8145223e-01],
       [3.1779730e-03],
       [2.4487602e-03],
       [2.7712667e-04]], dtype=float32)
In [15]:
y_pred1 = []
for element in y_pred:
    if element > 0.5:
        y_pred1.append(1)
    else:
        y_pred1.append(0)
In [16]:
y_pred1[25:40]
Out[16]:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
In [17]:
y_test[25:40]
Out[17]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
In [18]:
# print the classification report
from sklearn.metrics import classification_report, confusion_matrix 
print(classification_report(y_test,y_pred1))
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      1673
           1       0.98      0.91      0.94       307

    accuracy                           0.98      1980
   macro avg       0.98      0.95      0.97      1980
weighted avg       0.98      0.98      0.98      1980

Heat Map

In [19]:
import seaborn as sn

cm = tf.math.confusion_matrix(labels = y_test, predictions = y_pred1)

plt.figure(figsize = (10,8))
sn.heatmap(cm, annot = True, fmt = 'd')
plt.xlabel("predicted")
plt.ylabel("actual")
Out[19]:
Text(69.0, 0.5, 'actual')

deepCC

In [21]:
!deepCC "Simple ANN.h5"
[INFO]
Reading [keras model] 'Simple ANN.h5'
[SUCCESS]
Saved 'Simple ANN_deepC/Simple ANN.onnx'
[INFO]
Reading [onnx model] 'Simple ANN_deepC/Simple ANN.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) dense_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'Simple ANN_deepC/Simple ANN.cpp'
[INFO]
deepSea model files are ready in 'Simple ANN_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "Simple ANN_deepC/Simple ANN.cpp" -D_AITS_MAIN -o "Simple ANN_deepC/Simple ANN.exe"
[RUNNING COMMAND]
size "Simple ANN_deepC/Simple ANN.exe"
   text	   data	    bss	    dec	    hex	filename
 140331	   2968	    760	 144059	  232bb	Simple ANN_deepC/Simple ANN.exe
[SUCCESS]
Saved model as executable "Simple ANN_deepC/Simple ANN.exe"