Weather Classifcation

Credit: AITS Cainvas Community

Photo by Sergey Galtsev on Dribbble

Image tagging helps in selecting images based on content, especially useful in search engines and other similar applications. Here, we tag images based on the weather of the scene. There are two classes - cloudy, sunny.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from keras import layers, optimizers, models, preprocessing, losses, callbacks
import os
import random
from PIL import Image
import tensorflow as tf
import keras


"Two-class Weather Classification" Cewu Lu, Di Lin, Jiaya Jia, Chi-Keung Tang IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

On Kaggle by Paula

In [2]:
!unzip -qo

--2021-01-08 14:41:24--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 116150379 (111M) [application/zip]
Saving to: ‘’         100%[===================>] 110.77M  91.2MB/s    in 1.2s    

2021-01-08 14:41:25 (91.2 MB/s) - ‘’ saved [116150379/116150379]

In [3]:
# Loading the dataset

path = 'weather/'
input_shape = (256, 256, 3)    # default input shape while loading the images

batch = 64

# The train and test datasets
print("Train dataset")
train_ds = preprocessing.image_dataset_from_directory(path+'train', batch_size=batch, label_mode='binary')

print("Test dataset")
test_ds = preprocessing.image_dataset_from_directory(path+'test', batch_size=batch, label_mode='binary')
Train dataset
Found 10000 files belonging to 2 classes.
Test dataset
Found 253 files belonging to 2 classes.
In [4]:
# How many samples in each class

for t in ['train', 'test']:
    print('\n', t.upper())
    for x in os.listdir(path + t):
        print(x, ' - ', len(os.listdir(path + t + '/' + x)))
sunny  -  5000
cloudy  -  5000

sunny  -  153
cloudy  -  100

The train set is balanced while the test set is imbalanced. A confusion matrix can help in finding the accuracies.

In [5]:
# Looking into the class labels

class_names = train_ds.class_names

print("Train class names: ", train_ds.class_names)
print("Test class names: ", test_ds.class_names)
Train class names:  ['cloudy', 'sunny']
Test class names:  ['cloudy', 'sunny']


In [6]:
num_samples = 4    # the number of samples to be displayed in each class

for x in class_names:
    plt.figure(figsize=(20, 20))

    filenames = os.listdir(path + 'train/' + x)

    for i in range(num_samples):
        ax = plt.subplot(1, num_samples, i + 1)
        img = +'train/' + x + '/' + filenames[i])