Cainvas

American Sign Language Detection App

with Acceleromter, Gyroscope and PyTorch


Credit: AITS Cainvas Community

Photo by Sammi Schouten on Dribbble

American Sign Language (ASL) is a complete, natural language that has the same linguistic properties as spoken languages, with grammar that differs from English. ASL is expressed by movements of the hands and face.

So to detect ASL with hand gestures, we will create a American Sign Language Detection App with deep learning.

Importing Libraries

In [1]:
import os, time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchsummary import summary

Set a fixed random seed value, for reproducibility, this will allow us to get the same random numbers each time the notebook is run

In [2]:
SEED = 1337
np.random.seed(SEED)
torch.manual_seed(SEED)

cuda_available = torch.cuda.is_available()
device = torch.device("cuda" if cuda_available else "cpu")
if cuda_available:
    torch.cuda.manual_seed(SEED)

Dataset

Curated from Arduino Nano 33 BLE Sense, by AITS team.

In [3]:
# sudo AWS_ACCESS_KEY_ID={} AWS_SECRET_ACCESS_KEY={} aws s3 cp --recursive s3://cainvas-static/media/user_data/aitswarrior/ .
In [4]:
!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/asl_imu_dataset.zip
!unzip -qo asl_imu_dataset.zip
!rm asl_imu_dataset.zip
--2021-08-11 08:12:10--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/asl_imu_dataset.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.66.104
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.66.104|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘asl_imu_dataset.zip’ not modified on server. Omitting download.

Visualize Dataset

In [5]:
def plot_gesture(filenames, type = "accelerometer"):

    for filename in filenames:
        df = pd.read_csv(filename)

        index = range(1, len(df['ax']) + 1)
        fig = plt.figure(figsize=(12,6))
        plt.title(type.title() + ' - Gesture File "' + os.path.splitext(filename)[0].upper() + '" plot')

        if type=="accelerometer":
            plt.plot(index, df['ax'], 'g.', label='x', linestyle='solid', marker=',')
            plt.plot(index, df['ay'], 'b.', label='y', linestyle='solid', marker=',')
            plt.plot(index, df['az'], 'r.', label='z', linestyle='solid', marker=',')
            plt.ylabel("Acceleration (G)")

        elif type=="gyroscope":
            plt.plot(index, df['gx'], 'g.', label='x', linestyle='solid', marker=',')
            plt.plot(index, df['gy'], 'b.', label='y', linestyle='solid', marker=',')
            plt.plot(index, df['gz'], 'r.', label='z', linestyle='solid', marker=',')
            plt.ylabel("Gyroscope (deg/sec)")

        plt.xlabel("Sample")
        plt.legend()
        plt.show()
        print()
In [6]:
gesture_files = ['asl_imu_dataset/thankyou/thankyou_dataset_1.csv', 'asl_imu_dataset/help/help_dataset_1.csv']

plot_gesture(gesture_files, type = "accelerometer")
plot_gesture(gesture_files, type = "gyroscope")




Loading the Dataset

Parse the CSV files and transform them to a format that can be used to train the fully connected neural network.

In [7]:
FREQUENCY = 6932/60   # Aroung 6932 samples per 60 second captured through Cainvas Pailette
GESTURE_CYCLE_TIME = 4   # Each fully captuerd gestures were 4 seconds long
SAMPLES_PER_GESTURE = int(FREQUENCY * GESTURE_CYCLE_TIME)   # Number of samples in a gesture
    
data_dir = "asl_imu_dataset"

CLASSES = [gesture_class for gesture_class in os.listdir(data_dir) \
                       if 'ipynb_checkpoints' not in gesture_class]
NUM_GESTURES = len(CLASSES)

inputs = []

# read each csv file and push an input and output
for gesture_class in os.listdir(data_dir):
    gesture_dataframes = []
    for gesture_file in os.listdir(os.path.join(data_dir, gesture_class)):
        gesture_file = os.path.join(data_dir, gesture_class, gesture_file)

        if not os.path.isfile(gesture_file) or os.path.splitext(gesture_file)[1] != ".csv":
            continue

        df = pd.read_csv(gesture_file)

        # get rid of pesky empty value lines of csv which cause NaN inputs
        df = df.dropna()
        df = df.reset_index(drop=True)

        num_recordings = int(df.shape[0] // SAMPLES_PER_GESTURE)
        print(f"\tThere are ({df.shape[0]}/{SAMPLES_PER_GESTURE}) = {num_recordings}",
            f"recordings of the '{os.path.basename(gesture_file)}' gesture.")


        df = df.loc[:(num_recordings*SAMPLES_PER_GESTURE)-1]
        
        # normalize the input data, between 0 to 1:
        # - acceleration is between: -4 to +4
        # - gyroscope is between: -2000 to +2000
        df.loc[:,['ax','ay','az']] = (df.loc[:,['ax','ay','az']]+4)/8
        df.loc[:,['gx','gy','gz']] = (df.loc[:,['gx','gy','gz']]+2000)/4000
    
        gesture_dataframes.append(df)
        
    gesture = pd.concat(gesture_dataframes, ignore_index=True).to_numpy()
    print("There are {} recordings in total for '{}' gesture\n".format((gesture.shape), gesture_class))
    inputs.append(gesture)

print("Data set parsing and augmentation complete.")
	There are (6791/462) = 14 recordings of the 'thankyou_dataset_5.csv' gesture.
	There are (6784/462) = 14 recordings of the 'thankyou_dataset_6.csv' gesture.
	There are (68927/462) = 149 recordings of the 'thank_you_dataset_599_seconds_1.csv' gesture.
	There are (6324/462) = 13 recordings of the 'thankyou_dataset_4.csv' gesture.
	There are (6777/462) = 14 recordings of the 'thankyou_dataset_1.csv' gesture.
	There are (5755/462) = 12 recordings of the 'thankyou_dataset_3.csv' gesture.
	There are (6786/462) = 14 recordings of the 'thankyou_dataset_2.csv' gesture.
There are (106260, 6) recordings in total for 'thankyou' gesture

	There are (68924/462) = 149 recordings of the 'help_dataset_599_seconds_1.csv' gesture.
	There are (6795/462) = 14 recordings of the 'help_dataset_2.csv' gesture.
	There are (6783/462) = 14 recordings of the 'help_dataset_4.csv' gesture.
	There are (6785/462) = 14 recordings of the 'help_dataset_5.csv' gesture.
	There are (68935/462) = 149 recordings of the 'help_dataset_599s_0811.csv' gesture.
	There are (6784/462) = 14 recordings of the 'help_dataset_3.csv' gesture.
	There are (6792/462) = 14 recordings of the 'help_dataset_1.csv' gesture.
	There are (6788/462) = 14 recordings of the 'help_dataset_6.csv' gesture.
There are (176484, 6) recordings in total for 'help' gesture

	There are (68933/462) = 149 recordings of the 'more_dataset_599_seconds_1.csv' gesture.
	There are (34520/462) = 74 recordings of the 'more_dataset_300_seconds.csv' gesture.
There are (103026, 6) recordings in total for 'more' gesture

	There are (68932/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_5.csv' gesture.
	There are (68934/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_9.csv' gesture.
	There are (68940/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_7.csv' gesture.
	There are (68937/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_4.csv' gesture.
	There are (68934/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_8.csv' gesture.
	There are (68929/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_2.csv' gesture.
	There are (68938/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_6.csv' gesture.
	There are (68940/462) = 149 recordings of the 'no_gesture_599_seconds_1.csv' gesture.
	There are (68941/462) = 149 recordings of the 'no_gesture_dataset_599_seconds_3.csv' gesture.
There are (619542, 6) recordings in total for 'no_gesture' gesture

	There are (62842/462) = 136 recordings of the 'today_dataset_599_seconds_1.csv' gesture.
	There are (21843/462) = 47 recordings of the 'today_dataset_300_seconds.csv' gesture.
There are (84546, 6) recordings in total for 'today' gesture

Data set parsing and augmentation complete.

Preprocessing

To extract 2 seconds gesture from 4 seconds dataset

In [8]:
PEAK_GESTURE_TIME = 2      # seconds
PEAK_SAMPLES = int(FREQUENCY * PEAK_GESTURE_TIME) #samples per peak gesture
peak_sample_width = int(PEAK_SAMPLES//2)


def plot_gesture(np_tensor_i, title):
    fig = plt.figure(figsize=(12,6))
    plt.title(title)
    plt.plot(np_tensor_i)
    plt.show()
    print()

preprocessed_inputs = []
no_gestures = []

def get_peak_by_axis(np_tensor_i, peak_sample_width, axis):
    idx_max = np.argmax(np_tensor_i[:,axis])
    idx_min = np.argmin(np_tensor_i[:,axis])
    index = peak_geture_index = 0
    if abs(idx_max-idx_min) < (peak_sample_width*2):
        peak_geture_index = int((idx_max+idx_min)//2)
    else:
        # don't consider the gestures where lowest and 
        # highest points are not in 2 seconds window
        return -1  
        
        # if we consider those points, we have to check
        # which 2 seconds window would be better suited
        val_max = np.max(np_tensor_i[:,axis])
        val_min = np.min(np_tensor_i[:,axis])
        peak_geture_index = idx_max if (val_max >= val_min) else idx_min
    low = 0
    high = np_tensor_i.shape[0]
    # when peak starts early
    if (peak_geture_index-peak_sample_width < low):
        index = peak_sample_width
    # when peak starts late
    elif(peak_geture_index+peak_sample_width > high):
        index = high - peak_sample_width
    # when peak is in between
    else:
        index = peak_geture_index
    return index


for gesture_class_index, np_tensor in enumerate(inputs):
    
    weight_axis = relative_change = 0
    for axis in range(np_tensor.shape[1]):
        curr_relative_change = (np.max(np_tensor[:,axis])-np.min(np_tensor[:,axis]))\
                                    /(np.sum(np_tensor[:,axis])/np_tensor.shape[0])
        if (curr_relative_change > relative_change):
            relative_change = curr_relative_change
            weight_axis = axis

    # split the numpy tensor with each split containing 4 seconds recording
    num_recordings = int(np_tensor.shape[0] // SAMPLES_PER_GESTURE)
    np_tensor = np.array(np.split(np_tensor, num_recordings))
    
    temp_tensors = []

    for np_tensor_i in np_tensor:

        index = get_peak_by_axis(np_tensor_i, peak_sample_width, weight_axis)
        if (index < 0):
            continue
        
        start = 0
        end = np_tensor_i.shape[0]
        low = index-peak_sample_width
        high = index+peak_sample_width
        
        temp_tensors.append(np_tensor_i[low:high])
        
        # adding those redundant samples to no_gesture to remove flickering
        if CLASSES[gesture_class_index] != "no_gesture":
            # if we can extract at least 1 second sample from left
            if ((low-start) >= peak_sample_width):
                np_no_gesture_i = np.concatenate((
                                        np_tensor_i[start:low], 
                                        np.flip(np_tensor_i[start:low], 0)))
                no_gestures.append(np_no_gesture_i[:2*peak_sample_width])
            
            if ((end-high) >= peak_sample_width):
                np_no_gesture_i = np.concatenate((
                                        np_tensor_i[high:end], 
                                        np.flip(np_tensor_i[high:end], 0)))
                no_gestures.append(np_no_gesture_i[:2*peak_sample_width])
        
    # converting list of np arrays to np array
    np_tensor = np.array(temp_tensors)
    
    preprocessed_inputs.append(np_tensor)

print("Dataset preprocessing complete")
Dataset preprocessing complete

Dataset distribution

In [9]:
dist = [0]*len(CLASSES)
for i, gesture_class in enumerate(CLASSES):
    if gesture_class == "no_gesture":
        dist[i] += len(no_gestures)
    dist[i] += preprocessed_inputs[i].shape[0]
        
fig = plt.figure()
plt.bar(CLASSES, dist)
    
plt.xlabel("\nASL Gestures")
plt.ylabel("Number of samples")
plt.title("ASL Gesture dataset distribution")
plt.show()

Augmentation

Augmentation to balance the dataset

In [10]:
TOTAL_SAMPLES_REQUIRED = 60000    # for augmentation
samples_per_class = (TOTAL_SAMPLES_REQUIRED//NUM_GESTURES)    # total samples count per class
    
inputs = []
outputs = []

for gesture_class_index, np_tensor in enumerate(preprocessed_inputs):
    
    if CLASSES[gesture_class_index] == "no_gesture":
        np_tensor = np.concatenate((np.array(no_gestures), np_tensor))
            
    curr_total_samples = np_tensor.shape[0]
    
    # calculation the number of times to repeat the actual gesture class
    required_repeat = (samples_per_class//curr_total_samples)+1
    
    # expanding the original gesture to augment in the next step
    aug_np_tensor = np.tile(np_tensor, (required_repeat,1,1))
    
    aug_np_tensor = aug_np_tensor[:samples_per_class-curr_total_samples]
    
    # adding random 15% noise overall 
    noise_threshold = 0.15
    random_noise = np.random.uniform(low=(1-noise_threshold), high=(1+noise_threshold), size=aug_np_tensor.shape)
    aug_np_tensor = np.multiply(aug_np_tensor, random_noise)
    
    # append the augmented gesture to the original one
    np_tensor = np.concatenate((np_tensor, aug_np_tensor))
    
#     flattening 6 axes to 1
    np_tensor = np_tensor.reshape(np_tensor.shape[0], -1)

    inputs += [np_tensor]
    outputs += [gesture_class_index]*samples_per_class

inputs = np.concatenate(inputs)
outputs = np.array(outputs)

# print(inputs.shape)
# print(outputs.shape)

print("Dataset augmentation complete")
Dataset augmentation complete

Dataset distribution after augmentation

In [11]:
dist = [np.count_nonzero(outputs == i) for i in range(NUM_GESTURES)]
  
fig = plt.figure()
plt.bar(CLASSES, dist)
    
plt.xlabel("\nASL Gestures")
plt.ylabel("Number of samples")
plt.title("ASL Gesture dataset distribution after augmentation")
plt.show()

Randomize and split the input and output pairs for training

Randomly split input and output pairs into sets of data: 80% for training and 20% for testing.

  • the training set is used to train the model
  • the validation set is used to measure how well the model is performing during training
  • the testing set is used to test the model after training
In [12]:
# Randomize the order of the inputs, so they can be evenly distributed for training, testing, and validation
# https://stackoverflow.com/a/37710486/2020087
num_inputs = len(inputs)
INPUT_LEN = len(inputs[0])

inputs = inputs.reshape((num_inputs, 1, INPUT_LEN))

randomize = np.arange(num_inputs)
np.random.shuffle(randomize)

# Swap the consecutive indexes (0, 1, 2, etc) with the randomized indexes
inputs = inputs[randomize]
outputs = outputs[randomize]

# Split the recordings (group of samples) into three sets: training, testing and validation
TRAIN_SPLIT = int(0.8 * num_inputs)
TEST_SPLIT = int(0.2 * num_inputs + TRAIN_SPLIT)


inputs_train, inputs_test, inputs_validate = np.split(inputs, [TRAIN_SPLIT, TEST_SPLIT])
outputs_train, outputs_test, outputs_validate = np.split(outputs, [TRAIN_SPLIT, TEST_SPLIT])

# we are training on the entire data, if it's less than 1000
# if ( inputs_train.shape[0] < 1000 ):
#     inputs_train = inputs
#     outputs_train = outputs

print("inputs train shape :", inputs_train.shape)
print("inputs test shape :", inputs_test.shape)
print("outputs train shape :", outputs_train.shape)
print("outputs test shape :", outputs_test.shape)

INPUT_LEN = inputs_train.shape[1]
print("\nData set randomization and splitting complete.")
inputs train shape : (48000, 1, 1380)
inputs test shape : (12000, 1, 1380)
outputs train shape : (48000,)
outputs test shape : (12000,)

Data set randomization and splitting complete.

Build the PyTorch Model

In [13]:
class aslSeqModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            # (L-K+2P)/S + 1
            nn.Conv1d(1, 16, kernel_size=30, stride=10),   # 1380 --> (1380-30)/10 + 1 -->  136
            nn.ReLU(),
            nn.MaxPool1d(2),  # 136/2 --> 67
            
            nn.Conv1d(16, 32, kernel_size=15, stride=5),   # 67 --> (67-15)/5 + 1 --> 11
            nn.ReLU(),
            nn.MaxPool1d(2),  # 11/2 --> 5
            
            nn.Flatten(),
            nn.Linear(32 * 5, NUM_GESTURES),
#             nn.Softmax(dim=0),
        )
        
    def forward(self, x):
        return self.network(x)

    
    
model = aslSeqModel().to(device)
model_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)

optimizer = optim.Adam(model.parameters(), lr=1e-3)

criterion = nn.CrossEntropyLoss()

print(model)
print("Model Parameter Count :", model_parameters)
aslSeqModel2Sec(
  (network): Sequential(
    (0): Conv1d(1, 16, kernel_size=(30,), stride=(10,))
    (1): ReLU()
    (2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv1d(16, 32, kernel_size=(15,), stride=(5,))
    (4): ReLU()
    (5): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=160, out_features=5, bias=True)
  )
)
Model Parameter Count : 9013

Train the Model

In [14]:
EPOCHS = 100
BATCHES = 100
losses = []
accuracies = []

TOTAL_BATCHES = int(inputs_train.shape[0]//BATCHES)

torch_inputs_train = torch.from_numpy(np.array(inputs_train, dtype=np.float32)).to(device)
torch_outputs_train = torch.from_numpy(np.array(outputs_train, dtype=np.float32)).long().to(device)

for epoch in range(EPOCHS):
    start = time.time()
    model.train()
    running_loss = 0.0
    
    for i in range(TOTAL_BATCHES):
        # Get Samples
        data = torch_inputs_train[i*BATCHES:(i+1)*BATCHES,:,:]
        target = torch_outputs_train[i*BATCHES:(i+1)*BATCHES]

        # Init
        optimizer.zero_grad()

        # Predict
        y_pred = model(data)

        # Calculate loss
        loss = criterion(y_pred, target)
        running_loss += loss.to(device).data
        
        # Backpropagation
        loss.backward()
        optimizer.step()
        
        # Display
        if (i == TOTAL_BATCHES-1):
            model.eval()
            output = model(torch_inputs_train)
            pred = output.data.max(1)[1]
            d = pred.eq(torch_outputs_train.data)
            accuracy = d.sum().item()/d.size().numel()
            losses.append(running_loss/TOTAL_BATCHES)
            accuracies.append(accuracy)
            end = time.time()
            print('\rTrain Epoch: {}/{} [{}/{}] (took {}ms)\t\tLoss: {:.6f}\t\tAccuracy: {}/{}={:.1f}%'.format(
                  epoch+1,
                  EPOCHS,
                  i+1, 
                  TOTAL_BATCHES, 
                  int((end-start)*1000),
                  running_loss/TOTAL_BATCHES, 
                  d.sum(), d.size().numel(), accuracy*100,
                  end=''))
    
#     for name, param in model.named_parameters():
#        if param.requires_grad and name=="fc3.weight":
#          print ("\nparameter: ", name, model.fc3.weight.grad)
Train Epoch: 1/100 [480/480] (took 833ms)		Loss: 1.212350		Accuracy: 35998/48000=75.0%
Train Epoch: 2/100 [480/480] (took 807ms)		Loss: 0.639988		Accuracy: 39259/48000=81.8%
Train Epoch: 3/100 [480/480] (took 801ms)		Loss: 0.517074		Accuracy: 40553/48000=84.5%
Train Epoch: 4/100 [480/480] (took 815ms)		Loss: 0.440883		Accuracy: 41576/48000=86.6%
Train Epoch: 5/100 [480/480] (took 800ms)		Loss: 0.383868		Accuracy: 42365/48000=88.3%
Train Epoch: 6/100 [480/480] (took 806ms)		Loss: 0.339575		Accuracy: 43036/48000=89.7%
Train Epoch: 7/100 [480/480] (took 809ms)		Loss: 0.304411		Accuracy: 43597/48000=90.8%
Train Epoch: 8/100 [480/480] (took 803ms)		Loss: 0.276859		Accuracy: 44053/48000=91.8%
Train Epoch: 9/100 [480/480] (took 818ms)		Loss: 0.255214		Accuracy: 44389/48000=92.5%
Train Epoch: 10/100 [480/480] (took 805ms)		Loss: 0.237321		Accuracy: 44657/48000=93.0%
Train Epoch: 11/100 [480/480] (took 808ms)		Loss: 0.222150		Accuracy: 44812/48000=93.4%
Train Epoch: 12/100 [480/480] (took 802ms)		Loss: 0.209116		Accuracy: 45020/48000=93.8%
Train Epoch: 13/100 [480/480] (took 804ms)		Loss: 0.197412		Accuracy: 45218/48000=94.2%
Train Epoch: 14/100 [480/480] (took 810ms)		Loss: 0.186924		Accuracy: 45354/48000=94.5%
Train Epoch: 15/100 [480/480] (took 807ms)		Loss: 0.177741		Accuracy: 45561/48000=94.9%
Train Epoch: 16/100 [480/480] (took 817ms)		Loss: 0.169088		Accuracy: 45707/48000=95.2%
Train Epoch: 17/100 [480/480] (took 804ms)		Loss: 0.161337		Accuracy: 45796/48000=95.4%
Train Epoch: 18/100 [480/480] (took 802ms)		Loss: 0.154400		Accuracy: 45869/48000=95.6%
Train Epoch: 19/100 [480/480] (took 841ms)		Loss: 0.147983		Accuracy: 45934/48000=95.7%
Train Epoch: 20/100 [480/480] (took 809ms)		Loss: 0.142226		Accuracy: 46038/48000=95.9%
Train Epoch: 21/100 [480/480] (took 819ms)		Loss: 0.137048		Accuracy: 46079/48000=96.0%
Train Epoch: 22/100 [480/480] (took 821ms)		Loss: 0.131964		Accuracy: 46146/48000=96.1%
Train Epoch: 23/100 [480/480] (took 810ms)		Loss: 0.127527		Accuracy: 46219/48000=96.3%
Train Epoch: 24/100 [480/480] (took 814ms)		Loss: 0.123270		Accuracy: 46257/48000=96.4%
Train Epoch: 25/100 [480/480] (took 808ms)		Loss: 0.119756		Accuracy: 46326/48000=96.5%
Train Epoch: 26/100 [480/480] (took 814ms)		Loss: 0.116080		Accuracy: 46321/48000=96.5%
Train Epoch: 27/100 [480/480] (took 822ms)		Loss: 0.112561		Accuracy: 46352/48000=96.6%
Train Epoch: 28/100 [480/480] (took 806ms)		Loss: 0.109448		Accuracy: 46351/48000=96.6%
Train Epoch: 29/100 [480/480] (took 814ms)		Loss: 0.106195		Accuracy: 46343/48000=96.5%
Train Epoch: 30/100 [480/480] (took 811ms)		Loss: 0.103462		Accuracy: 46367/48000=96.6%
Train Epoch: 31/100 [480/480] (took 810ms)		Loss: 0.101087		Accuracy: 46380/48000=96.6%
Train Epoch: 32/100 [480/480] (took 839ms)		Loss: 0.098621		Accuracy: 46434/48000=96.7%
Train Epoch: 33/100 [480/480] (took 819ms)		Loss: 0.096026		Accuracy: 46463/48000=96.8%
Train Epoch: 34/100 [480/480] (took 815ms)		Loss: 0.093154		Accuracy: 46532/48000=96.9%
Train Epoch: 35/100 [480/480] (took 806ms)		Loss: 0.090529		Accuracy: 46584/48000=97.0%
Train Epoch: 36/100 [480/480] (took 805ms)		Loss: 0.088222		Accuracy: 46647/48000=97.2%
Train Epoch: 37/100 [480/480] (took 822ms)		Loss: 0.085957		Accuracy: 46746/48000=97.4%
Train Epoch: 38/100 [480/480] (took 806ms)		Loss: 0.083725		Accuracy: 46832/48000=97.6%
Train Epoch: 39/100 [480/480] (took 839ms)		Loss: 0.081438		Accuracy: 46923/48000=97.8%
Train Epoch: 40/100 [480/480] (took 810ms)		Loss: 0.079299		Accuracy: 46950/48000=97.8%
Train Epoch: 41/100 [480/480] (took 821ms)		Loss: 0.077306		Accuracy: 46995/48000=97.9%
Train Epoch: 42/100 [480/480] (took 836ms)		Loss: 0.075384		Accuracy: 46993/48000=97.9%
Train Epoch: 43/100 [480/480] (took 804ms)		Loss: 0.073655		Accuracy: 47022/48000=98.0%
Train Epoch: 44/100 [480/480] (took 810ms)		Loss: 0.071863		Accuracy: 47039/48000=98.0%
Train Epoch: 45/100 [480/480] (took 820ms)		Loss: 0.070315		Accuracy: 47039/48000=98.0%
Train Epoch: 46/100 [480/480] (took 806ms)		Loss: 0.068250		Accuracy: 47089/48000=98.1%
Train Epoch: 47/100 [480/480] (took 821ms)		Loss: 0.066221		Accuracy: 47113/48000=98.2%
Train Epoch: 48/100 [480/480] (took 810ms)		Loss: 0.064204		Accuracy: 47139/48000=98.2%
Train Epoch: 49/100 [480/480] (took 825ms)		Loss: 0.062249		Accuracy: 47162/48000=98.3%
Train Epoch: 50/100 [480/480] (took 808ms)		Loss: 0.060665		Accuracy: 47184/48000=98.3%
Train Epoch: 51/100 [480/480] (took 802ms)		Loss: 0.059234		Accuracy: 47198/48000=98.3%
Train Epoch: 52/100 [480/480] (took 817ms)		Loss: 0.057610		Accuracy: 47191/48000=98.3%
Train Epoch: 53/100 [480/480] (took 815ms)		Loss: 0.056238		Accuracy: 47212/48000=98.4%
Train Epoch: 54/100 [480/480] (took 834ms)		Loss: 0.054758		Accuracy: 47232/48000=98.4%
Train Epoch: 55/100 [480/480] (took 808ms)		Loss: 0.053634		Accuracy: 47207/48000=98.3%
Train Epoch: 56/100 [480/480] (took 806ms)		Loss: 0.052433		Accuracy: 47218/48000=98.4%
Train Epoch: 57/100 [480/480] (took 812ms)		Loss: 0.051187		Accuracy: 47200/48000=98.3%
Train Epoch: 58/100 [480/480] (took 800ms)		Loss: 0.050098		Accuracy: 47206/48000=98.3%
Train Epoch: 59/100 [480/480] (took 811ms)		Loss: 0.049147		Accuracy: 47216/48000=98.4%
Train Epoch: 60/100 [480/480] (took 802ms)		Loss: 0.047840		Accuracy: 47227/48000=98.4%
Train Epoch: 61/100 [480/480] (took 801ms)		Loss: 0.046850		Accuracy: 47294/48000=98.5%
Train Epoch: 62/100 [480/480] (took 810ms)		Loss: 0.045789		Accuracy: 47318/48000=98.6%
Train Epoch: 63/100 [480/480] (took 802ms)		Loss: 0.044906		Accuracy: 47333/48000=98.6%
Train Epoch: 64/100 [480/480] (took 815ms)		Loss: 0.043821		Accuracy: 47329/48000=98.6%
Train Epoch: 65/100 [480/480] (took 804ms)		Loss: 0.042946		Accuracy: 47376/48000=98.7%
Train Epoch: 66/100 [480/480] (took 805ms)		Loss: 0.041993		Accuracy: 47400/48000=98.8%
Train Epoch: 67/100 [480/480] (took 825ms)		Loss: 0.041399		Accuracy: 47384/48000=98.7%
Train Epoch: 68/100 [480/480] (took 807ms)		Loss: 0.040549		Accuracy: 47407/48000=98.8%
Train Epoch: 69/100 [480/480] (took 816ms)		Loss: 0.039779		Accuracy: 47426/48000=98.8%
Train Epoch: 70/100 [480/480] (took 816ms)		Loss: 0.038979		Accuracy: 47467/48000=98.9%
Train Epoch: 71/100 [480/480] (took 809ms)		Loss: 0.038271		Accuracy: 47448/48000=98.9%
Train Epoch: 72/100 [480/480] (took 826ms)		Loss: 0.037642		Accuracy: 47477/48000=98.9%
Train Epoch: 73/100 [480/480] (took 822ms)		Loss: 0.036898		Accuracy: 47489/48000=98.9%
Train Epoch: 74/100 [480/480] (took 805ms)		Loss: 0.036200		Accuracy: 47501/48000=99.0%
Train Epoch: 75/100 [480/480] (took 824ms)		Loss: 0.035528		Accuracy: 47513/48000=99.0%
Train Epoch: 76/100 [480/480] (took 806ms)		Loss: 0.034908		Accuracy: 47512/48000=99.0%
Train Epoch: 77/100 [480/480] (took 820ms)		Loss: 0.034496		Accuracy: 47535/48000=99.0%
Train Epoch: 78/100 [480/480] (took 802ms)		Loss: 0.033935		Accuracy: 47521/48000=99.0%
Train Epoch: 79/100 [480/480] (took 807ms)		Loss: 0.033220		Accuracy: 47547/48000=99.1%
Train Epoch: 80/100 [480/480] (took 826ms)		Loss: 0.032826		Accuracy: 47548/48000=99.1%
Train Epoch: 81/100 [480/480] (took 805ms)		Loss: 0.032202		Accuracy: 47553/48000=99.1%
Train Epoch: 82/100 [480/480] (took 834ms)		Loss: 0.031611		Accuracy: 47556/48000=99.1%
Train Epoch: 83/100 [480/480] (took 816ms)		Loss: 0.031304		Accuracy: 47551/48000=99.1%
Train Epoch: 84/100 [480/480] (took 802ms)		Loss: 0.030988		Accuracy: 47549/48000=99.1%
Train Epoch: 85/100 [480/480] (took 805ms)		Loss: 0.030423		Accuracy: 47552/48000=99.1%
Train Epoch: 86/100 [480/480] (took 829ms)		Loss: 0.030208		Accuracy: 47563/48000=99.1%
Train Epoch: 87/100 [480/480] (took 836ms)		Loss: 0.029694		Accuracy: 47561/48000=99.1%
Train Epoch: 88/100 [480/480] (took 803ms)		Loss: 0.029256		Accuracy: 47571/48000=99.1%
Train Epoch: 89/100 [480/480] (took 811ms)		Loss: 0.028695		Accuracy: 47576/48000=99.1%
Train Epoch: 90/100 [480/480] (took 850ms)		Loss: 0.028469		Accuracy: 47579/48000=99.1%
Train Epoch: 91/100 [480/480] (took 808ms)		Loss: 0.027898		Accuracy: 47599/48000=99.2%
Train Epoch: 92/100 [480/480] (took 811ms)		Loss: 0.027443		Accuracy: 47597/48000=99.2%
Train Epoch: 93/100 [480/480] (took 808ms)		Loss: 0.027310		Accuracy: 47609/48000=99.2%
Train Epoch: 94/100 [480/480] (took 809ms)		Loss: 0.026680		Accuracy: 47627/48000=99.2%
Train Epoch: 95/100 [480/480] (took 819ms)		Loss: 0.026341		Accuracy: 47637/48000=99.2%
Train Epoch: 96/100 [480/480] (took 805ms)		Loss: 0.025901		Accuracy: 47648/48000=99.3%
Train Epoch: 97/100 [480/480] (took 826ms)		Loss: 0.025316		Accuracy: 47645/48000=99.3%
Train Epoch: 98/100 [480/480] (took 805ms)		Loss: 0.025137		Accuracy: 47649/48000=99.3%
Train Epoch: 99/100 [480/480] (took 811ms)		Loss: 0.024581		Accuracy: 47644/48000=99.3%
Train Epoch: 100/100 [480/480] (took 835ms)		Loss: 0.024184		Accuracy: 47649/48000=99.3%

Plot

In [15]:
plt.plot(accuracies)
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.show()

plt.plot(losses)
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.show()

Evaluate

In [16]:
evaluate_x = torch.from_numpy(np.array(inputs_test, dtype=np.float32)).to(device)
evaluate_y = torch.from_numpy(outputs_test).to(device)
# print(evaluate_x.shape)
# print(evaluate_y.shape)
model.eval()
output = model(evaluate_x)
# print(output)
pred = output.data.max(1)[1]
# print(pred)
d = pred.eq(evaluate_y.data)
accuracy = d.sum().item()/d.size().numel()
    
print('Test Accuracy: {:.4f}%'.format(accuracy*100))
Test Accuracy: 98.2750%

Save as ONNX Model

In [17]:
dummy_input = torch.randn(np.expand_dims(inputs_train[0], axis=0).shape).to(device)
torch.onnx.export(model, (dummy_input), "./asl_imu.onnx", verbose=False)

deepCC

In [18]:
!deepCC asl_imu.onnx
[INFO]
Reading [onnx model] 'asl_imu.onnx'
[INFO]
Model info:
  ir_vesion : 6
  doc       : 
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'asl_imu_deepC/asl_imu.cpp'
[INFO]
deepSea model files are ready in 'asl_imu_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "asl_imu_deepC/asl_imu.cpp" -D_AITS_MAIN -o "asl_imu_deepC/asl_imu.exe"
[RUNNING COMMAND]
size "asl_imu_deepC/asl_imu.exe"
   text	   data	    bss	    dec	    hex	filename
 197773	   3152	    760	 201685	  313d5	asl_imu_deepC/asl_imu.exe
[SUCCESS]
Saved model as executable "asl_imu_deepC/asl_imu.exe"

DeepSea vs PyTorch Model Prediction

In [19]:
def compare(df, true_gesture, no_of_comparison=10):
    df = df.dropna()
    df = df.reset_index(drop=True)
    
    FREQUENCY = 6932/60   # Aroung 6932 samples per 60 second captured through Cainvas Pailette
    GESTURE_CYCLE_TIME = 2   # Each fully captuerd gestures were 2 seconds long
    SAMPLES_PER_GESTURE = int(FREQUENCY * GESTURE_CYCLE_TIME)   # Number of samples in a gesture
    
    random_indices = np.random.randint(0, 
                                       high = len(df) / SAMPLES_PER_GESTURE, 
                                       dtype = int, 
                                       size = no_of_comparison)
    for i in random_indices:
        tensor = []
        
        for j in range(SAMPLES_PER_GESTURE): 
            index = i * SAMPLES_PER_GESTURE + j
            
            # normalize the input data, between 0 to 1:
            # - acceleration is between: -4 to +4
            # - gyroscope is between: -2000 to +2000
            tensor += [
                (df['ax'][index] + 4) / 8,
                (df['ay'][index] + 4) / 8,
                (df['az'][index] + 4) / 8,
                (df['gx'][index] + 2000) / 4000,
                (df['gy'][index] + 2000) / 4000,
                (df['gz'][index] + 2000) / 4000
            ]

        np_tensor = np.expand_dims(np.array(tensor, dtype=np.float32), axis=(0, 1))  # (2772) --> (1, 1, 2772)

        print("True: \t\t\t", true_gesture)

        # torch
        model.eval()
        t_tensor = torch.from_numpy(np_tensor).to(device)
        t_output = model(t_tensor)
        
        sm = torch.nn.Softmax(dim=1)
        probabilities = sm(t_output) 
        
        t_pred = t_output.data.max(1)[1]
        print("Predict [PyTorch]: \t", CLASSES[t_pred], "({})".format(max(probabilities.tolist()[0])))
        
        # deepC
        np.savetxt('sample.data', np_tensor.flatten())
        !asl_imu_deepC/asl_imu.exe sample.data &> /dev/null
        dc_output = np.loadtxt('deepSea_result_1.out')

        probabilities = sm(torch.from_numpy(dc_output).unsqueeze(0).to(device))        
        dc_pred = np.argmax(dc_output)
        print("Predict [DeepSea]: \t", CLASSES[dc_pred], "({})".format(max(probabilities.tolist()[0])))
            
        print()
In [20]:
gesture_files = []
    
for gesture_class in os.listdir(data_dir):
    for gesture_file in os.listdir(os.path.join(data_dir, gesture_class)):
        if 'ipynb_checkpoints' in gesture_file:
            continue
        else:
            gesture_files.append(os.path.join(data_dir, gesture_class, gesture_file))
            break

for gesture_file in gesture_files:
    df = pd.read_csv(gesture_file)
    true_gesture = os.path.basename(os.path.dirname(gesture_file))
    compare(df, true_gesture, 3)
True: 			 thankyou
Predict [PyTorch]: 	 thankyou (1.0)
Predict [DeepSea]: 	 thankyou (0.9999999866159117)

True: 			 thankyou
Predict [PyTorch]: 	 no_gesture (0.9641393423080444)
Predict [DeepSea]: 	 no_gesture (0.9641395144346091)

True: 			 thankyou
Predict [PyTorch]: 	 thankyou (0.994273841381073)
Predict [DeepSea]: 	 thankyou (0.9942740665435523)

True: 			 help
Predict [PyTorch]: 	 help (0.9991342425346375)
Predict [DeepSea]: 	 help (0.999134179954735)

True: 			 help
Predict [PyTorch]: 	 help (0.9996814727783203)
Predict [DeepSea]: 	 help (0.9996814437631428)

True: 			 help
Predict [PyTorch]: 	 no_gesture (0.9997240900993347)
Predict [DeepSea]: 	 no_gesture (0.9997241452807388)

True: 			 more
Predict [PyTorch]: 	 no_gesture (0.5616773962974548)
Predict [DeepSea]: 	 no_gesture (0.5616790362944595)

True: 			 more
Predict [PyTorch]: 	 more (0.9999940395355225)
Predict [DeepSea]: 	 more (0.9999939818500072)

True: 			 more
Predict [PyTorch]: 	 no_gesture (0.9999873638153076)
Predict [DeepSea]: 	 no_gesture (0.9999873431894494)

True: 			 no_gesture
Predict [PyTorch]: 	 no_gesture (0.9999313354492188)
Predict [DeepSea]: 	 no_gesture (0.9999313297635943)

True: 			 no_gesture
Predict [PyTorch]: 	 no_gesture (0.9999257326126099)
Predict [DeepSea]: 	 no_gesture (0.9999257099583713)

True: 			 no_gesture
Predict [PyTorch]: 	 no_gesture (0.9999281167984009)
Predict [DeepSea]: 	 no_gesture (0.9999280948155054)

True: 			 today
Predict [PyTorch]: 	 no_gesture (0.9825952053070068)
Predict [DeepSea]: 	 no_gesture (0.9825938842247222)

True: 			 today
Predict [PyTorch]: 	 today (0.7753018140792847)
Predict [DeepSea]: 	 today (0.775299282920433)

True: 			 today
Predict [PyTorch]: 	 no_gesture (0.9996546506881714)
Predict [DeepSea]: 	 no_gesture (0.999654600215396)


The above results were generated with random 2 second samples from 4 second dataset. So prediction might differ from True value. What we are showing here is PyTorch vs DeepSea prediction and probability.


deepCC for Arduino Nano 33 BLE Sense

In [21]:
!rm -rf asl_imu_deepC/
!deepCC asl_imu.onnx --board="Arduino Nano 33 BLE Sense" --debug --archive --bundle
[INFO]
Reading [onnx model] 'asl_imu.onnx'
[INFO]
Model info:
  ir_vesion : 6
  doc       : 
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'asl_imu_deepC/asl_imu.cpp'
[INFO]
deepSea model files are ready in 'asl_imu_deepC/' 
[RUNNING COMMAND]
arm-none-eabi-g++ -std=c++11 -O3 -mcpu=cortex-m4 -specs=nosys.specs -mthumb -fno-exceptions -fno-rtti -msoft-float -mfloat-abi=softfp -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -I /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 -c "asl_imu_deepC/asl_imu.cpp" -o "asl_imu_deepC/asl_imu.o"
[RUNNING COMMAND]
arm-none-eabi-ar rcs "asl_imu_deepC/lib_asl_imu.a" "asl_imu_deepC/asl_imu.o"
[RUNNING COMMAND]
size "asl_imu_deepC/lib_asl_imu.a"
   text	   data	    bss	    dec	    hex	filename
  96930	      4	     96	  97030	  17b06	asl_imu.o (ex asl_imu_deepC/lib_asl_imu.a)
[SUCCESS]
Saved model as archive "asl_imu_deepC/lib_asl_imu.a"
[DEBUG]
Intermediate files won't be removed.
[BUNDLE]
Bundle "asl_imu_deepC/asl_imu.zip" generated.