Classification on Organic Compounds¶

# get data file
!wget -N "https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/musk.csv"

--2021-07-13 11:06:24--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/musk.csv
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.124
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.124|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4622724 (4.4M) [text/csv]
Saving to: ‘musk.csv’

musk.csv            100%[===================>]   4.41M  --.-KB/s    in 0.04s   

2021-07-13 11:06:24 (119 MB/s) - ‘musk.csv’ saved [4622724/4622724]

# Import the required libraries
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt

# read the csv file
dataset = pd.read_csv('musk.csv')
dataset.head()

Data Preprocessing¶

X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

# Scaling
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X = ss.fit_transform(X)

Split the data for training and testing¶

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

X_train.shape,y_train.shape

((4618, 166), (4618,))

Build and train the model¶

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(33, input_shape=(166,),
                          activation=tf.nn.tanh),    
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
model.summary()
# compile
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 33)                5511      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 34        
=================================================================
Total params: 5,545
Trainable params: 5,545
Non-trainable params: 0
_________________________________________________________________

history = model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs = 15)

Epoch 1/15
145/145 [==============================] - 0s 2ms/step - loss: 0.4147 - accuracy: 0.8367 - val_loss: 0.2794 - val_accuracy: 0.9146
Epoch 2/15
145/145 [==============================] - 0s 2ms/step - loss: 0.2333 - accuracy: 0.9290 - val_loss: 0.2037 - val_accuracy: 0.9293
Epoch 3/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1808 - accuracy: 0.9415 - val_loss: 0.1635 - val_accuracy: 0.9500
Epoch 4/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1522 - accuracy: 0.9511 - val_loss: 0.1428 - val_accuracy: 0.9545
Epoch 5/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1317 - accuracy: 0.9580 - val_loss: 0.1245 - val_accuracy: 0.9626
Epoch 6/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1162 - accuracy: 0.9643 - val_loss: 0.1204 - val_accuracy: 0.9581
Epoch 7/15
145/145 [==============================] - 0s 2ms/step - loss: 0.1027 - accuracy: 0.9664 - val_loss: 0.1113 - val_accuracy: 0.9616
Epoch 8/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0929 - accuracy: 0.9701 - val_loss: 0.0969 - val_accuracy: 0.9677
Epoch 9/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0829 - accuracy: 0.9729 - val_loss: 0.0874 - val_accuracy: 0.9737
Epoch 10/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0766 - accuracy: 0.9729 - val_loss: 0.0821 - val_accuracy: 0.9747
Epoch 11/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0695 - accuracy: 0.9786 - val_loss: 0.0755 - val_accuracy: 0.9753
Epoch 12/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0627 - accuracy: 0.9816 - val_loss: 0.0700 - val_accuracy: 0.9788
Epoch 13/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0584 - accuracy: 0.9827 - val_loss: 0.0668 - val_accuracy: 0.9783
Epoch 14/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0511 - accuracy: 0.9864 - val_loss: 0.0618 - val_accuracy: 0.9803
Epoch 15/15
145/145 [==============================] - 0s 2ms/step - loss: 0.0471 - accuracy: 0.9857 - val_loss: 0.0554 - val_accuracy: 0.9833

# Save the model
model.save("Simple ANN.h5")

Plots¶

# train_loss vs Val_loss
from matplotlib import pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

# train_accuracy vs val_accuracy
from matplotlib import pyplot as plt
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('acc')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

Predictions and Accuracy¶

# accuracy and loss
model.evaluate(X_test, y_test)

62/62 [==============================] - 0s 951us/step - loss: 0.0554 - accuracy: 0.9833

[0.055376846343278885, 0.9833333492279053]

y_pred = model.predict(X_test)
y_pred[5:10]

array([[8.6293503e-04],
       [4.8145223e-01],
       [3.1779730e-03],
       [2.4487602e-03],
       [2.7712667e-04]], dtype=float32)

y_pred1 = []
for element in y_pred:
    if element > 0.5:
        y_pred1.append(1)
    else:
        y_pred1.append(0)

y_pred1[25:40]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

y_test[25:40]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0])

# print the classification report
from sklearn.metrics import classification_report, confusion_matrix 
print(classification_report(y_test,y_pred1))

              precision    recall  f1-score   support

           0       0.98      1.00      0.99      1673
           1       0.98      0.91      0.94       307

    accuracy                           0.98      1980
   macro avg       0.98      0.95      0.97      1980
weighted avg       0.98      0.98      0.98      1980

Heat Map¶

import seaborn as sn

cm = tf.math.confusion_matrix(labels = y_test, predictions = y_pred1)

plt.figure(figsize = (10,8))
sn.heatmap(cm, annot = True, fmt = 'd')
plt.xlabel("predicted")
plt.ylabel("actual")

Text(69.0, 0.5, 'actual')

deepCC¶

!deepCC "Simple ANN.h5"

[INFO]
Reading [keras model] 'Simple ANN.h5'
[SUCCESS]
Saved 'Simple ANN_deepC/Simple ANN.onnx'
[INFO]
Reading [onnx model] 'Simple ANN_deepC/Simple ANN.onnx'
[INFO]
Model info:
  ir_vesion : 4
  doc       : 
[WARNING]
[ONNX]: terminal (input/output) dense_input's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense_1's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense_1) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'Simple ANN_deepC/Simple ANN.cpp'
[INFO]
deepSea model files are ready in 'Simple ANN_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "Simple ANN_deepC/Simple ANN.cpp" -D_AITS_MAIN -o "Simple ANN_deepC/Simple ANN.exe"
[RUNNING COMMAND]
size "Simple ANN_deepC/Simple ANN.exe"
   text	   data	    bss	    dec	    hex	filename
 140331	   2968	    760	 144059	  232bb	Simple ANN_deepC/Simple ANN.exe
[SUCCESS]
Saved model as executable "Simple ANN_deepC/Simple ANN.exe"

	ID	molecule_name	conformation_name	f1	f2	f3	f4	f5	f6	f7	...	f158	f159	f160	f161	f162	f163	f164	f165	f166	class
0	1	MUSK-211	211_1+1	46	-108	-60	-69	-117	49	38	...	-308	52	-7	39	126	156	-50	-112	96	1
1	2	MUSK-211	211_1+10	41	-188	-145	22	-117	-6	57	...	-59	-2	52	103	136	169	-61	-136	79	1
2	3	MUSK-211	211_1+11	46	-194	-145	28	-117	73	57	...	-134	-154	57	143	142	165	-67	-145	39	1
3	4	MUSK-211	211_1+12	41	-188	-145	22	-117	-7	57	...	-60	-4	52	104	136	168	-60	-135	80	1
4	5	MUSK-211	211_1+13	41	-188	-145	22	-117	-7	57	...	-60	-4	52	104	137	168	-60	-135	80	1