Cainvas

Neural Style Tranfer

Neural Style Transfer is stylizing an image in style of another image. Here we try to implement two methods of Neural Style Transfer namely - Optimization-based Neural Style Transfer and Fast Neural Style Transfer.

Aplications of Neural Style Transfer

  1. Creating excellent artistic looking images using different style images.
  2. Can be used for Image Data augmentation purposes.
  3. Websites like DeepArt and Prisma use Neural Style Transfer to create artistic images.
In [1]:
!wget https://cainvas-static.s3.amazonaws.com/media/user_data/Yuvnish17/neural_style_transfer_data.zip
!unzip -qo neural_style_transfer_data.zip
--2021-07-20 05:38:09--  https://cainvas-static.s3.amazonaws.com/media/user_data/Yuvnish17/neural_style_transfer_data.zip
Resolving cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)... 52.219.156.67
Connecting to cainvas-static.s3.amazonaws.com (cainvas-static.s3.amazonaws.com)|52.219.156.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64932428 (62M) [application/x-zip-compressed]
Saving to: ‘neural_style_transfer_data.zip’

neural_style_transf 100%[===================>]  61.92M  74.4MB/s    in 0.8s    

2021-07-20 05:38:10 (74.4 MB/s) - ‘neural_style_transfer_data.zip’ saved [64932428/64932428]

Importing Libraries

In [2]:
import matplotlib.pyplot as plt
import cv2
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.applications import vgg16
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.regularizers import Regularizer
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import concatenate
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from neural_style_transfer_data.layers import InputNormalize,VGGNormalize,ReflectionPadding2D,Denormalize,conv_bn_relu,res_conv,dconv_bn_nolinear
import time

Optimization-based Neural Style Transfer

Loading the Input Image and the Style Image

In [3]:
img = cv2.imread('neural_style_transfer_data/content.png')
print(img.shape)
(height, width, channels) = img.shape
img_rows = 400
img_cols = int(width * img_rows / height)
result_prefix = "neural_style_transfer_generated"

# Weights of the different loss components
total_variation_weight = 1e-6
style_weight = 5e-6
content_weight = 1e-8
(556, 572, 3)

Visualizing the The Style Image and the Input Image

In [4]:
figure = plt.figure(figsize=(10, 10))
style_img = cv2.imread('neural_style_transfer_data/style_image3.jpg')
style_img = cv2.cvtColor(style_img, cv2.COLOR_BGR2RGB)
plt.imshow(style_img)
plt.axis('off')
plt.title('Style Image')

figure2 = plt.figure(figsize=(10, 10))
content_img = cv2.imread('neural_style_transfer_data/content.png')
content_img = cv2.cvtColor(content_img, cv2.COLOR_BGR2RGB)
plt.imshow(content_img)
plt.axis('off')
plt.title('Content Image')
Out[4]:
Text(0.5, 1.0, 'Content Image')

Defining functions for Preprocessing and Deprocessing images

In [5]:
def preprocess_image(image_path):
    image = cv2.imread(image_path)
    image = cv2.resize(image, (img_cols, img_rows))
    image = np.array(image)
    image = np.expand_dims(image, axis=0)
    image = vgg16.preprocess_input(image)
    return image


def deprocess_image(x):
    # Util function to convert a tensor into a valid image
    x = x.reshape((img_rows, img_cols, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype("uint8")
    return x

Defining various loss functions and helper functions

In [6]:
def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram
In [7]:
def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_rows * img_cols
    return tf.reduce_sum(tf.square(S - C)) / (channels * (3 ** 2) * (size ** 2))
In [8]:
def content_loss(base, combination):
    return tf.reduce_sum(tf.square(combination - base))
In [9]:
def total_variation_loss(x):
    a = tf.square(x[:, : img_rows - 1, : img_cols - 1, :] - x[:, 1:, : img_cols - 1, :])
    b = tf.square(x[:, : img_rows - 1, : img_cols - 1, :] - x[:, : img_rows - 1, 1:, :])
    return tf.reduce_sum(tf.pow(a + b, 1.25))

Loading the VGG16 model and printing Model Architecture

In [10]:
# Build a VGG19 model loaded with pre-trained ImageNet weights
model = vgg16.VGG16(weights="imagenet", include_top=False)

# Get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# Set up a model that returns the activation values for every layer in
# VGG19 (as a dict).
feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 3s 0us/step
In [11]:
feature_extractor.summary()
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

Defining the Feature Extraction layers and the Total Loss function of the model

In [12]:
style_layer_names = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]
# The layer to use for the content loss.
content_layer_name = "block5_conv2"


def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], axis=0
    )
    features = feature_extractor(input_tensor)

    # Initialize the loss
    loss = tf.zeros(shape=())

    # Add content loss
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(
        base_image_features, combination_features
    )
    # Add style loss
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss(style_reference_features, combination_features)
        loss += (style_weight / len(style_layer_names)) * sl

    # Add total variation loss
    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss
In [13]:
@tf.function
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads

Computing the Losses and Transforming the Combination Image

In [14]:
# optimizer = keras.optimizers.Adam(
#     keras.optimizers.schedules.ExponentialDecay(
#         initial_learning_rate=1000.0, decay_steps=1000, decay_rate=0.96
#     )
# )
optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=500, decay_rate=0.96
    )
)

base_image = preprocess_image('neural_style_transfer_data/content.png')
style_reference_image = preprocess_image('neural_style_transfer_data/style_image3.jpg')
combination_image = tf.Variable(preprocess_image('neural_style_transfer_data/content.png'))
print(base_image.shape)
print(style_reference_image.shape)
print(combination_image.shape)
losses = []
iter = []
# combination_image = tf.Variable(content_img)

iterations = 8000
for i in range(1, iterations + 1):
    loss, grads = compute_loss_and_grads(
        combination_image, base_image, style_reference_image
    )
    optimizer.apply_gradients([(grads, combination_image)])
    losses.append(loss)
    iter.append(i)
    if i % 100 == 0:
        print("Iteration %d: loss=%.2f" % (i, loss))
    if i == 8000:
        img = deprocess_image(combination_image.numpy())
        fname = result_prefix + "_at_iteration_%d.png" % i
#         keras.preprocessing.image.save_img(fname, img)
        cv2.imwrite(fname, img)
(1, 400, 411, 3)
(1, 400, 411, 3)
(1, 400, 411, 3)
Iteration 100: loss=3669.29
Iteration 200: loss=2745.66
Iteration 300: loss=2342.40
Iteration 400: loss=2106.42
Iteration 500: loss=1948.29
Iteration 600: loss=1833.90
Iteration 700: loss=1747.25
Iteration 800: loss=1679.17
Iteration 900: loss=1623.99
Iteration 1000: loss=1578.08
Iteration 1100: loss=1539.69
Iteration 1200: loss=1506.81
Iteration 1300: loss=1478.38
Iteration 1400: loss=1453.66
Iteration 1500: loss=1431.95
Iteration 1600: loss=1412.74
Iteration 1700: loss=1395.62
Iteration 1800: loss=1380.23
Iteration 1900: loss=1366.34
Iteration 2000: loss=1353.70
Iteration 2100: loss=1342.17
Iteration 2200: loss=1331.58
Iteration 2300: loss=1321.80
Iteration 2400: loss=1312.78
Iteration 2500: loss=1304.41
Iteration 2600: loss=1296.64
Iteration 2700: loss=1289.36
Iteration 2800: loss=1282.55
Iteration 2900: loss=1276.15
Iteration 3000: loss=1270.15
Iteration 3100: loss=1264.51
Iteration 3200: loss=1259.17
Iteration 3300: loss=1254.09
Iteration 3400: loss=1249.26
Iteration 3500: loss=1244.71
Iteration 3600: loss=1240.39
Iteration 3700: loss=1236.29
Iteration 3800: loss=1232.40
Iteration 3900: loss=1228.69
Iteration 4000: loss=1225.16
Iteration 4100: loss=1221.79
Iteration 4200: loss=1218.55
Iteration 4300: loss=1215.46
Iteration 4400: loss=1212.50
Iteration 4500: loss=1209.67
Iteration 4600: loss=1206.96
Iteration 4700: loss=1204.36
Iteration 4800: loss=1201.85
Iteration 4900: loss=1199.45
Iteration 5000: loss=1197.14
Iteration 5100: loss=1194.90
Iteration 5200: loss=1192.74
Iteration 5300: loss=1190.65
Iteration 5400: loss=1188.63
Iteration 5500: loss=1186.67
Iteration 5600: loss=1184.77
Iteration 5700: loss=1182.93
Iteration 5800: loss=1181.15
Iteration 5900: loss=1179.44
Iteration 6000: loss=1177.76
Iteration 6100: loss=1176.13
Iteration 6200: loss=1174.55
Iteration 6300: loss=1173.02
Iteration 6400: loss=1171.54
Iteration 6500: loss=1170.09
Iteration 6600: loss=1168.68
Iteration 6700: loss=1167.30
Iteration 6800: loss=1165.97
Iteration 6900: loss=1164.68
Iteration 7000: loss=1163.42
Iteration 7100: loss=1162.18
Iteration 7200: loss=1160.99
Iteration 7300: loss=1159.83
Iteration 7400: loss=1158.70
Iteration 7500: loss=1157.60
Iteration 7600: loss=1156.52
Iteration 7700: loss=1155.47
Iteration 7800: loss=1154.45
Iteration 7900: loss=1153.46
Iteration 8000: loss=1152.49

Visualizing the Result

In [15]:
figure2 = plt.figure(figsize=(10, 10))
content_img = cv2.imread('neural_style_transfer_generated_at_iteration_8000.png')
content_img = cv2.cvtColor(content_img, cv2.COLOR_BGR2RGB)
plt.imshow(content_img)
plt.axis('off')
plt.title('Content Image')
Out[15]:
Text(0.5, 1.0, 'Content Image')

Loss vs Number of Iterations Plot starting (values are plotted after the 100th iteration)

In [16]:
iter = [i for i in range(100, 8001)]
figure = plt.figure(figsize=(10, 10))
plt.plot(iter, losses[99:])
plt.xlabel('number of iterations')
plt.ylabel('loss value')
plt.title('Loss vs Number of Iterations plot')
plt.show()

Fast Neural Style Transfer

Defining the various helper functions, loss functions and model architecture functions using Keras and Tensorflow for model architecture visualization

In [14]:
def preprocess_image(image_path, image_width, image_height):
    image = cv2.imread(image_path)
    image = cv2.resize(image, (image_width, image_height))
    image = np.array(image)
    image = np.expand_dims(image, axis=0)
#     image = vgg16.preprocess_input(image)
    return image


def deprocess_image(x):
    # Util function to convert a tensor into a valid image
    x = x.reshape((img_rows, img_cols, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype("uint8")
    return x
In [15]:
dummy_loss_value = K.variable(0.0)
def dummy_loss(y_true, y_pred):
    return dummy_loss_value
In [16]:
@tf.function
def compute_loss_and_grads(y_true, y_pred):
    with tf.GradientTape() as tape:
        loss = dummy_loss(y_true, y_pred)
    grads = tape.gradient(loss, y_pred)
    return loss, grads
In [17]:
def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram
In [18]:
class StyleReconstructionRegularizer(Regularizer):
    """ Johnson et al 2015 https://arxiv.org/abs/1603.08155 """

    def __init__(self, style_feature_target, weight=1.0):
        self.style_feature_target = style_feature_target
        self.weight = weight
        self.uses_learning_phase = False
        super(StyleReconstructionRegularizer, self).__init__()

        self.style_gram = gram_matrix(style_feature_target)

    def __call__(self, x):
        output = x.output[0] # Generated by network
        loss = self.weight *  K.sum(K.mean(K.square((self.style_gram-gram_matrix(output) )))) 

        return loss


class FeatureReconstructionRegularizer(Regularizer):
    """ Johnson et al 2015 https://arxiv.org/abs/1603.08155 """

    def __init__(self, weight=1.0):
        self.weight = weight
        self.uses_learning_phase = False
        super(FeatureReconstructionRegularizer, self).__init__()

    def __call__(self, x):
        generated = x.output[0] # Generated by network features
        content = x.output[1] # True X input features

        loss = self.weight *  K.sum(K.mean(K.square(content-generated)))
        return loss


class TVRegularizer(Regularizer):
    """ Enforces smoothness in image output. """

    def __init__(self, weight=1.0):
        self.weight = weight
        self.uses_learning_phase = False
        super(TVRegularizer, self).__init__()

    def __call__(self, x):
        assert K.ndim(x.output) == 4
        x_out = x.output
        
        shape = K.shape(x_out)
        img_width, img_height,channel = (shape[1],shape[2], shape[3])
        size = img_width * img_height * channel 
        if K.image_data_format() == 'th':
            a = K.square(x_out[:, :, :img_width - 1, :img_height - 1] - x_out[:, :, 1:, :img_height - 1])
            b = K.square(x_out[:, :, :img_width - 1, :img_height - 1] - x_out[:, :, :img_width - 1, 1:])
        else:
            a = K.square(x_out[:, :img_width - 1, :img_height - 1, :] - x_out[:, 1:, :img_height - 1, :])
            b = K.square(x_out[:, :img_width - 1, :img_height - 1, :] - x_out[:, :img_width - 1, 1:, :])
        loss = self.weight * K.sum(K.pow(a + b, 1.25)) 
        return loss
In [19]:
def image_transform_net(img_width,img_height,tv_weight=1):
    x = Input(shape=(img_width,img_height,3))
    a = InputNormalize()(x)
    a = ReflectionPadding2D(padding=(40,40),input_shape=(img_width,img_height,3))(a)
    a = conv_bn_relu(32, 9, 9, stride=(1,1))(a)
    a = conv_bn_relu(64, 9, 9, stride=(2,2))(a)
    a = conv_bn_relu(128, 3, 3, stride=(2,2))(a)
    for i in range(5):
        a = res_conv(128,3,3)(a)
    a = dconv_bn_nolinear(64,3,3)(a)
    a = dconv_bn_nolinear(32,3,3)(a)
    a = dconv_bn_nolinear(3,9,9,stride=(1,1),activation="tanh")(a)
    # Scale output to range [0, 255] via custom Denormalize layer
    y = Denormalize(name='transform_output')(a)
    
    model = Model(inputs=x, outputs=y)
    
    if tv_weight > 0:
        add_total_variation_loss(model.layers[-1],tv_weight)
#         add_total_variation_loss(y, tv_weight)
        
    return model, x, y
#     return x, y
In [20]:
def loss_net(x_in, trux_x_in,width, height,style_image_path,content_weight,style_weight):
    # Append the initial input to the FastNet input to the VGG inputs
    x = concatenate([x_in, trux_x_in], axis=0)
    
    # Normalize the inputs via custom VGG Normalization layer
    x = VGGNormalize(name="vgg_normalize")(x)

    vgg = vgg16.VGG16(include_top=False, weights=None, input_tensor=x)
    vgg.load_weights('neural_style_transfer_data/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5', by_name=True)
    vgg.summary()

    vgg_output_dict = dict([(layer.name, layer.output) for layer in vgg.layers[-18:]])
    vgg_layers = dict([(layer.name, layer) for layer in vgg.layers[-18:]])

    if style_weight > 0:
        add_style_loss(vgg,style_image_path , vgg_layers, vgg_output_dict, width, height,style_weight)   

    if content_weight > 0:
        add_content_loss(vgg_layers,vgg_output_dict,content_weight)

    # Freeze all VGG layers
    for layer in vgg.layers[-19:]:
        layer.trainable = False

    return vgg
In [21]:
def add_style_loss(vgg,style_image_path,vgg_layers,vgg_output_dict,img_width, img_height,weight):
    style_img = preprocess_image(style_image_path, img_width, img_height)
    print('Getting style features from VGG network.')

    style_layers = ['block1_conv2', 'block2_conv2', 'block3_conv3', 'block4_conv3']

    style_layer_outputs = []

    for layer in style_layers:
        style_layer_outputs.append(vgg_output_dict[layer])
    print(style_layer_outputs)
    print(vgg.layers[-20].input)

    vgg_style_func = K.function([vgg.input], style_layer_outputs)

    style_features = vgg_style_func([style_img])

    # Style Reconstruction Loss
    for i, layer_name in enumerate(style_layers):
        layer = vgg_layers[layer_name]

        feature_var = K.variable(value=style_features[i][0])
        style_loss = StyleReconstructionRegularizer(
                            style_feature_target=feature_var,
                            weight=weight)(layer)

        layer.add_loss(style_loss)
In [22]:
def add_content_loss(vgg_layers,vgg_output_dict,weight):
    # Feature Reconstruction Loss
    content_layer = 'block3_conv3'
    content_layer_output = vgg_output_dict[content_layer]

    layer = vgg_layers[content_layer]
    content_regularizer = FeatureReconstructionRegularizer(weight)(layer)
    layer.add_loss(content_regularizer)
In [23]:
def add_total_variation_loss(transform_output_layer,weight):
    # Total Variation Regularization
    layer = transform_output_layer  # Output layer
    tv_regularizer = TVRegularizer(weight)(layer)
    layer.add_loss(tv_regularizer)

Keras version of Model Architecture used in Original Fast Neural Style Transfer technique

In [24]:
style_weight = 4.0
content_weight = 1.0
tv_weight = 1e-6
img_width = img_height = 64
style_image_path = 'neural_style_transfer_data/style_image2.jpg'

net, input_itn, output_itn = image_transform_net(img_width, img_height, tv_weight)
# x = Input(shape=(img_width, img_height, 3))
model = loss_net(output_itn, input_itn, img_width, img_height, style_image_path, content_weight, style_weight)
Model: "vgg16"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, 64, 64, 3)]  0                                            
__________________________________________________________________________________________________
input_normalize (InputNormalize (None, 64, 64, 3)    0           input_2[0][0]                    
__________________________________________________________________________________________________
reflection_padding2d (Reflectio (None, 144, 144, 3)  0           input_normalize[0][0]            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 144, 144, 32) 7808        reflection_padding2d[0][0]       
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 144, 144, 32) 128         conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (None, 144, 144, 32) 0           batch_normalization[0][0]        
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 72, 72, 64)   165952      activation[0][0]                 
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 72, 72, 64)   256         conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 72, 72, 64)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 36, 36, 128)  73856       activation_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 36, 36, 128)  512         conv2d_2[0][0]                   
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 36, 36, 128)  0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 34, 34, 128)  147584      activation_2[0][0]               
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 34, 34, 128)  512         conv2d_3[0][0]                   
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 34, 34, 128)  0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 32, 32, 128)  147584      activation_3[0][0]               
__________________________________________________________________________________________________
cropping2d (Cropping2D)         (None, 32, 32, 128)  0           activation_2[0][0]               
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32, 32, 128)  512         conv2d_4[0][0]                   
__________________________________________________________________________________________________
add (Add)                       (None, 32, 32, 128)  0           cropping2d[0][0]                 
                                                                 batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 30, 30, 128)  147584      add[0][0]                        
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 30, 30, 128)  512         conv2d_5[0][0]                   
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 30, 30, 128)  0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 28, 28, 128)  147584      activation_4[0][0]               
__________________________________________________________________________________________________
cropping2d_1 (Cropping2D)       (None, 28, 28, 128)  0           add[0][0]                        
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 28, 28, 128)  512         conv2d_6[0][0]                   
__________________________________________________________________________________________________
add_1 (Add)                     (None, 28, 28, 128)  0           cropping2d_1[0][0]               
                                                                 batch_normalization_6[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 26, 26, 128)  147584      add_1[0][0]                      
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 26, 26, 128)  512         conv2d_7[0][0]                   
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 26, 26, 128)  0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 24, 24, 128)  147584      activation_5[0][0]               
__________________________________________________________________________________________________
cropping2d_2 (Cropping2D)       (None, 24, 24, 128)  0           add_1[0][0]                      
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 24, 24, 128)  512         conv2d_8[0][0]                   
__________________________________________________________________________________________________
add_2 (Add)                     (None, 24, 24, 128)  0           cropping2d_2[0][0]               
                                                                 batch_normalization_8[0][0]      
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 22, 22, 128)  147584      add_2[0][0]                      
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 22, 22, 128)  512         conv2d_9[0][0]                   
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 22, 22, 128)  0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 20, 20, 128)  147584      activation_6[0][0]               
__________________________________________________________________________________________________
cropping2d_3 (Cropping2D)       (None, 20, 20, 128)  0           add_2[0][0]                      
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 20, 20, 128)  512         conv2d_10[0][0]                  
__________________________________________________________________________________________________
add_3 (Add)                     (None, 20, 20, 128)  0           cropping2d_3[0][0]               
                                                                 batch_normalization_10[0][0]     
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 18, 18, 128)  147584      add_3[0][0]                      
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 18, 18, 128)  512         conv2d_11[0][0]                  
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 18, 18, 128)  0           batch_normalization_11[0][0]     
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 16, 16, 128)  147584      activation_7[0][0]               
__________________________________________________________________________________________________
cropping2d_4 (Cropping2D)       (None, 16, 16, 128)  0           add_3[0][0]                      
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 16, 16, 128)  512         conv2d_12[0][0]                  
__________________________________________________________________________________________________
add_4 (Add)                     (None, 16, 16, 128)  0           cropping2d_4[0][0]               
                                                                 batch_normalization_12[0][0]     
__________________________________________________________________________________________________
un_pooling2d (UnPooling2D)      (None, 32, 32, 128)  0           add_4[0][0]                      
__________________________________________________________________________________________________
reflection_padding2d_1 (Reflect (None, 36, 36, 128)  0           un_pooling2d[0][0]               
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 34, 34, 64)   73792       reflection_padding2d_1[0][0]     
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 34, 34, 64)   256         conv2d_13[0][0]                  
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 34, 34, 64)   0           batch_normalization_13[0][0]     
__________________________________________________________________________________________________
un_pooling2d_1 (UnPooling2D)    (None, 68, 68, 64)   0           activation_8[0][0]               
__________________________________________________________________________________________________
reflection_padding2d_2 (Reflect (None, 72, 72, 64)   0           un_pooling2d_1[0][0]             
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 70, 70, 32)   18464       reflection_padding2d_2[0][0]     
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 70, 70, 32)   128         conv2d_14[0][0]                  
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 70, 70, 32)   0           batch_normalization_14[0][0]     
__________________________________________________________________________________________________
un_pooling2d_2 (UnPooling2D)    (None, 70, 70, 32)   0           activation_9[0][0]               
__________________________________________________________________________________________________
reflection_padding2d_3 (Reflect (None, 72, 72, 32)   0           un_pooling2d_2[0][0]             
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 64, 64, 3)    7779        reflection_padding2d_3[0][0]     
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 64, 64, 3)    12          conv2d_15[0][0]                  
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 64, 64, 3)    0           batch_normalization_15[0][0]     
__________________________________________________________________________________________________
transform_output (Denormalize)  (None, 64, 64, 3)    0           activation_10[0][0]              
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 64, 64, 3)    0           transform_output[0][0]           
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
vgg_normalize (VGGNormalize)    (None, 64, 64, 3)    0           concatenate[0][0]                
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 64, 64, 64)   1792        vgg_normalize[0][0]              
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, 64, 64, 64)   36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, 32, 32, 64)   0           block1_conv2[0][0]               
__________________________________________________________________________________________________
block2_conv1 (Conv2D)           (None, 32, 32, 128)  73856       block1_pool[0][0]                
__________________________________________________________________________________________________
block2_conv2 (Conv2D)           (None, 32, 32, 128)  147584      block2_conv1[0][0]               
__________________________________________________________________________________________________
block2_pool (MaxPooling2D)      (None, 16, 16, 128)  0           block2_conv2[0][0]               
__________________________________________________________________________________________________
block3_conv1 (Conv2D)           (None, 16, 16, 256)  295168      block2_pool[0][0]                
__________________________________________________________________________________________________
block3_conv2 (Conv2D)           (None, 16, 16, 256)  590080      block3_conv1[0][0]               
__________________________________________________________________________________________________
block3_conv3 (Conv2D)           (None, 16, 16, 256)  590080      block3_conv2[0][0]               
__________________________________________________________________________________________________
block3_pool (MaxPooling2D)      (None, 8, 8, 256)    0           block3_conv3[0][0]               
__________________________________________________________________________________________________
block4_conv1 (Conv2D)           (None, 8, 8, 512)    1180160     block3_pool[0][0]                
__________________________________________________________________________________________________
block4_conv2 (Conv2D)           (None, 8, 8, 512)    2359808     block4_conv1[0][0]               
__________________________________________________________________________________________________
block4_conv3 (Conv2D)           (None, 8, 8, 512)    2359808     block4_conv2[0][0]               
__________________________________________________________________________________________________
block4_pool (MaxPooling2D)      (None, 4, 4, 512)    0           block4_conv3[0][0]               
__________________________________________________________________________________________________
block5_conv1 (Conv2D)           (None, 4, 4, 512)    2359808     block4_pool[0][0]                
__________________________________________________________________________________________________
block5_conv2 (Conv2D)           (None, 4, 4, 512)    2359808     block5_conv1[0][0]               
__________________________________________________________________________________________________
block5_conv3 (Conv2D)           (None, 4, 4, 512)    2359808     block5_conv2[0][0]               
__________________________________________________________________________________________________
block5_pool (MaxPooling2D)      (None, 2, 2, 512)    0           block5_conv3[0][0]               
==================================================================================================
Total params: 16,544,591
Trainable params: 16,541,385
Non-trainable params: 3,206
__________________________________________________________________________________________________
Getting style features from VGG network.
[<tf.Tensor 'block1_conv2/Relu_1:0' shape=(None, 64, 64, 64) dtype=float32>, <tf.Tensor 'block2_conv2/Relu_1:0' shape=(None, 32, 32, 128) dtype=float32>, <tf.Tensor 'block3_conv3/Relu_1:0' shape=(None, 16, 16, 256) dtype=float32>, <tf.Tensor 'block4_conv3/Relu_1:0' shape=(None, 8, 8, 512) dtype=float32>]
[<tf.Tensor 'transform_output/mul:0' shape=(None, 64, 64, 3) dtype=float32>, <tf.Tensor 'input_2:0' shape=(None, 64, 64, 3) dtype=float32>]

Using Pre-Trained PyTroch models for the Udnie Style Image for stylizing the Input Image

In [25]:
model = cv2.dnn.readNetFromTorch('neural_style_transfer_data/udnie.t7')

image = cv2.imread('neural_style_transfer_data/content.png')

(h, w) = image.shape[:2]
image = cv2.resize(image, (600, h))
(h, w, c) = image.shape
print(h, w, c)
556 600 3
In [26]:
blob = cv2.dnn.blobFromImage(image, 1.0, (w, h), (103.939, 116.779, 123.680), swapRB=False, crop=False)

model.setInput(blob)
output = model.forward()
output = output.reshape((3, output.shape[2], output.shape[3]))
output[0] += 103.939
output[1] += 116.779
output[2] += 123.680
output /= 255.0
output = output.transpose(1, 2, 0)
In [27]:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
figure = plt.figure(figsize=(10, 10))
plt.imshow(image)
plt.axis('off')

figure2 = plt.figure(figsize=(10, 10))
plt.imshow(output)
plt.axis('off')
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Out[27]:
(-0.5, 599.5, 555.5, -0.5)
In [ ]: