Visual Wake word detection¶

Visual wake word detection is the classification of images into 2 classes - with person(s) or without. Just as audio wake word systems respond to a specific phrase, visual wake word systems respond to the presence of humans in the frame.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers, callbacks, optimizers
import tensorflow.keras 
import os
import random
from PIL import Image

Dataset¶

The dataset is derived from COCO 2017 and reduced to 100mb using the script here.

!wget -N https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/vww.zip

!unzip -qo vww.zip  

!rm vww.zip

--2021-09-07 08:55:05--  https://cainvas-static.s3.amazonaws.com/media/user_data/cainvas-admin/vww.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.55
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.55|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 95042991 (91M) [application/zip]
Saving to: ‘vww.zip’

vww.zip             100%[===================>]  90.64M   108MB/s    in 0.8s    

2021-09-07 08:55:06 (108 MB/s) - ‘vww.zip’ saved [95042991/95042991]

The dataset folder has two sub-folders - person and notperson containing images of respective types.

data_dir = 'vww/'

print("Number of samples")
for f in os.listdir(data_dir + '/'):
    if os.path.isdir(data_dir + '/' + f):
        print(f, " : ", len(os.listdir(data_dir + '/' + f +'/')))

Number of samples
notperson  :  300
person  :  300

It is a balanced dataset.

batch_size = 64

print("Training set")
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  image_size=(96, 96),
  validation_split=0.2,
  subset="training",
  seed=113, 
  batch_size=batch_size)

print("Validation set")
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  image_size=(96, 96),
  validation_split=0.2,
  subset="validation",
  seed=113, 
  batch_size=batch_size)

Training set
Found 600 files belonging to 2 classes.
Using 480 files for training.
Validation set
Found 600 files belonging to 2 classes.
Using 120 files for validation.

Define the class_names for use later.

class_names = train_ds.class_names
print(class_names)

['notperson', 'person']

Visualization¶

num_samples = 4    # the number of samples to be displayed in each class

for x in class_names:
    plt.figure(figsize=(10, 10))

    filenames = os.listdir(data_dir + x)

    for i in range(num_samples):
        ax = plt.subplot(1, num_samples, i + 1)
        img = Image.open(data_dir + x + '/' + filenames[i])
        plt.imshow(img)
        plt.title(x)
        plt.axis("off")

Preprocessing¶

Defining the input shape¶

print("Shape of one training batch")

for image_batch, labels_batch in train_ds:
    input_shape = image_batch[0].shape
    print("Input: ", image_batch.shape)
    print("Labels: ", labels_batch.shape)
    break

Shape of one training batch
Input:  (64, 96, 96, 3)
Labels:  (64,)

Normalizing the pixel values¶

Pixel values are now integers between 0 and 255. Changing them to the range [0, 1] for faster convergence.

# Normalizing the pixel values

normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)

train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y))

The model¶

Transfer learning

base_model = tf.keras.applications.MobileNetV2(weights= 'imagenet', input_shape=input_shape, include_top=False)    # False, do not include the classification layer of the model

base_model.trainable = False

inputs = tf.keras.Input(shape=input_shape)

x = base_model(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
outputs = tf.keras.layers.Dense(len(class_names), activation = 'softmax')(x)    # Add own classififcation layer

model = tf.keras.Model(inputs, outputs)

cb = [callbacks.EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)]
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_96_no_top.h5
9412608/9406464 [==============================] - 1s 0us/step
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 96, 96, 3)]       0         
_________________________________________________________________
mobilenetv2_1.00_96 (Functio (None, 3, 3, 1280)        2257984   
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280)              0         
_________________________________________________________________
dense (Dense)                (None, 2)                 2562      
=================================================================
Total params: 2,260,546
Trainable params: 2,562
Non-trainable params: 2,257,984
_________________________________________________________________

model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizers.Adam(0.01), metrics=['accuracy'])

history = model.fit(train_ds, validation_data =  val_ds, epochs=32, callbacks = cb)

Epoch 1/32
8/8 [==============================] - 1s 183ms/step - loss: 1.3990 - accuracy: 0.6062 - val_loss: 0.6828 - val_accuracy: 0.7417
Epoch 2/32
8/8 [==============================] - 1s 73ms/step - loss: 0.4535 - accuracy: 0.8583 - val_loss: 0.3675 - val_accuracy: 0.9083
Epoch 3/32
8/8 [==============================] - 1s 75ms/step - loss: 0.2457 - accuracy: 0.9187 - val_loss: 0.3599 - val_accuracy: 0.9000
Epoch 4/32
8/8 [==============================] - 1s 73ms/step - loss: 0.1631 - accuracy: 0.9417 - val_loss: 0.3677 - val_accuracy: 0.9083
Epoch 5/32
8/8 [==============================] - 1s 72ms/step - loss: 0.0617 - accuracy: 0.9792 - val_loss: 0.3936 - val_accuracy: 0.9000
Epoch 6/32
8/8 [==============================] - 1s 72ms/step - loss: 0.0300 - accuracy: 0.9979 - val_loss: 0.4084 - val_accuracy: 0.8917
Epoch 7/32
8/8 [==============================] - 1s 71ms/step - loss: 0.0202 - accuracy: 1.0000 - val_loss: 0.4144 - val_accuracy: 0.8750
Epoch 8/32
8/8 [==============================] - 1s 72ms/step - loss: 0.0158 - accuracy: 1.0000 - val_loss: 0.4152 - val_accuracy: 0.8833

model.evaluate(val_ds)

2/2 [==============================] - 0s 9ms/step - loss: 0.3599 - accuracy: 0.9000

[0.35985198616981506, 0.8999999761581421]

Plotting the metrics¶

def plot(history, variable, variable2):
    plt.plot(range(len(history[variable])), history[variable])
    plt.plot(range(len(history[variable2])), history[variable2])
    plt.title(variable)

plot(history.history, "accuracy", 'val_accuracy')

plot(history.history, "loss", "val_loss")

Prediction¶

# pick random test data sample from one batch
x = random.randint(0, batch_size - 1)

for i in val_ds.as_numpy_iterator():
    img, label = i    
    plt.axis('off')   # remove axes
    plt.imshow(img[x])    # shape from (64, 256, 256, 3) --> (256, 256, 3)
    output = model.predict(np.expand_dims(img[x],0))    # getting output; input shape (256, 256, 3) --> (1, 256, 256, 3)
    pred = np.argmax(output[0])    # finding max
    print("Predicted: ", class_names[pred])    # Picking the label from class_names base don the model output
    print("True: ", class_names[label[x]])
    print("Probability: ", output[0][pred])
    break

Predicted:  person
True:  notperson
Probability:  0.93282354

deepC¶

model.save('visual_wake_word.h5')

!deepCC visual_wake_word.h5

[INFO]
Reading [keras model] 'visual_wake_word.h5'
[SUCCESS]
Saved 'visual_wake_word_deepC/visual_wake_word.onnx'
[INFO]
Reading [onnx model] 'visual_wake_word_deepC/visual_wake_word.onnx'
[INFO]
Model info:
  ir_vesion : 5
  doc       : 
[WARNING]
[ONNX]: graph-node Conv1's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node expanded_conv_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node expanded_conv_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_1_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_1_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_1_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_2_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_2_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_2_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_3_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_3_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_3_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_4_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_4_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_4_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_5_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_5_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_5_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_6_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_6_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_6_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_7_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_7_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_7_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_8_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_8_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_8_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_9_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_9_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_9_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_10_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_10_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_10_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_11_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_11_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_11_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_12_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_12_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_12_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_13_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_13_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_13_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_14_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_14_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_14_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_15_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_15_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_15_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_16_expand's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_16_depthwise's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: graph-node block_16_project's attribute auto_pad has no meaningful data.
[WARNING]
[ONNX]: terminal (input/output) input_2's shape is less than 1. Changing it to 1.
[WARNING]
[ONNX]: terminal (input/output) dense's shape is less than 1. Changing it to 1.
WARN (GRAPH): found operator node with the same name (dense) as io node.
[INFO]
Running DNNC graph sanity check ...
[SUCCESS]
Passed sanity check.
[INFO]
Writing C++ file 'visual_wake_word_deepC/visual_wake_word.cpp'
[INFO]
deepSea model files are ready in 'visual_wake_word_deepC/' 
[RUNNING COMMAND]
g++ -std=c++11 -O3 -fno-rtti -fno-exceptions -I. -I/opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/include -isystem /opt/tljh/user/lib/python3.7/site-packages/deepC-0.13-py3.7-linux-x86_64.egg/deepC/packages/eigen-eigen-323c052e1731 "visual_wake_word_deepC/visual_wake_word.cpp" -D_AITS_MAIN -o "visual_wake_word_deepC/visual_wake_word.exe"
[RUNNING COMMAND]
size "visual_wake_word_deepC/visual_wake_word.exe"
   text	   data	    bss	    dec	    hex	filename
9164869	   4168	    760	9169797	 8beb85	visual_wake_word_deepC/visual_wake_word.exe
[SUCCESS]
Saved model as executable "visual_wake_word_deepC/visual_wake_word.exe"

Model Files
vww_custom_45.h5 keras Model
deepSea Compiled Models
vww_custom_45.exe deepSea Ubuntu