COVID-19 Data Analysis and prediction using Facebook's prophet model¶
Necessary libraries¶
pip install pandas
pip install matplotlib
pip install seaborn
pip install fbprophet
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import datetime as dt
import numpy as np
DataSet¶
The dataset used in this covid19 prediction and analysis project in India are obtained from kaggle,
- covid_19_india.csv that contains the data of date, Time, State/ Union territory, Confirmed cases,deaths and cured cases.
- StatewiseTestingDetails.csv contains the test data of all the states and Unionterritory.
#importing main dataset
df = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/AnandKumarE0/covid_19_india.csv', parse_dates=['Date'], dayfirst=True)
df.head()
df.tail()
Changing the column names
#keeping only required columns
df= df[['Date','State/UnionTerritory','Cured','Deaths','Confirmed']]
df.columns=['date', 'state','cured','deaths','confirmed' ]
#Looking at the earlier dates
df.head()
Today's covid data
#current date
today = df[df.date == '2020-08-27']
today.shape
today
Most affected States
#Sorting data w.r.t number of confirmed cases
max_confirmed_cases=today.sort_values(by="confirmed",ascending=False)
max_confirmed_cases
#Getting states with maximum number of confirmed cases
top_states_confirmed=max_confirmed_cases[0:5]
Visualizing the top 5 most covid-19 affected states
#Making bar-plot for states with top confirmed cases
sns.set(rc={'figure.figsize':(15,10)})
sns.barplot(x="state",y="confirmed",data=top_states_confirmed,hue="state")
plt.show()
#Sorting data w.r.t number of death cases
max_death_cases=today.sort_values(by="deaths",ascending=False)
max_death_cases
Visualizing States with most number of deaths
#Getting states with maximum number of death cases
top_states_death=max_death_cases[0:5]
#Making bar-plot for states with top death cases
sns.set(rc={'figure.figsize':(15,10)})
sns.barplot(x="state",y="deaths",data=top_states_death,hue="state")
plt.show()
Visualizing the states with more covid recovery cases
#Sorting data w.r.t number of cured cases
max_cured_cases=today.sort_values(by="cured",ascending=False)
max_cured_cases
#Getting states with maximum number of cured cases
top_states_cured=max_cured_cases[0:5]
#Making bar-plot for states with top death cases
sns.set(rc={'figure.figsize':(15,10)})
sns.barplot(x="state",y="cured",data=top_states_cured,hue="state")
plt.show()
Analysing Different states reports¶
TN = df[df['state'] == 'Tamil Nadu']
TN
visualizing the confirmed cases of Tamil Nadu
#Visualizing confirmed cases in Tamil Nadu
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="confirmed",data=TN,color="g")
plt.show()
#Visualizing death cases in Tamil Nadu
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="deaths",data=TN,color="r")
plt.show()
#kerala
kerala= df[df['state'] == 'Kerala']
kerala
#Visualizing confirmed cases in Kerala
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="confirmed",data=kerala,color="g")
plt.show()
#Visualizing death cases in Kerala
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="deaths",data=kerala,color="r")
plt.show()
#Jammu and Kashmir
jk= df[df['state'] == 'Jammu and Kashmir']
jk
#Visualizing confirmed cases in Jammu and Kashmir
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="confirmed",data=jk,color="g")
plt.show()
#Visualizing death cases in Jammu and Kashmir
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="deaths",data=jk,color="r")
plt.show()
Testing Details of State/UT:¶
#Checking state-wise testing details
tests = pd.read_csv('https://cainvas-static.s3.amazonaws.com/media/user_data/AnandKumarE0/StatewiseTestingDetails.csv')
tests
test_latest = tests[tests.Date == '2020-08-25']
test_latest
States conducts more number of testing:
#Sorting data w.r.t number of cured cases
max_tests_State=test_latest.sort_values(by="TotalSamples",ascending=False)
max_tests_State
visualizing the top 5 states where more testings done.
#Making bar-plot for states with max test cases
sns.set(rc={'figure.figsize':(15,10)})
sns.barplot(x="State",y="TotalSamples",data=max_tests_State[0:5],hue="State")
plt.show()
Predicting the covid 19 cases using Facebook's Prophet¶
Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI.
We use Prophet, a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well
from fbprophet import Prophet
confirmed = df.groupby('date').sum()['confirmed'].reset_index()
deaths = df.groupby('date').sum()['deaths'].reset_index()
recovered = df.groupby('date').sum()['cured'].reset_index()
confirmed.tail()
Predicting the Confirmed Cases:¶
confirmed.columns = ['ds','y']
#confirmed['ds'] = confirmed['ds'].dt.date
confirmed['ds'] = pd.to_datetime(confirmed['ds'])
m = Prophet(interval_width=0.95)
m.fit(confirmed)
future = m.make_future_dataframe(periods=30)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
plotting the prediction of confirmed cases
confirmed_forecast_plot = m.plot(forecast)
The blue datapoints are the original values and the blue line represents the forecast values.
confirmed_forecast_plot =m.plot_components(forecast)
Prediction of Deaths¶
deaths.columns = ['ds','y']
deaths['ds'] = pd.to_datetime(deaths['ds'])
m = Prophet(interval_width=0.95)
m.fit(deaths)
future = m.make_future_dataframe(periods=30)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
deaths_forecast_plot = m.plot(forecast)
The blue datapoints are the original values and the blue line represents the forecast values. From the above prediction it shows that the covid death cases continue to increase at the same rate for the next 30 days.
deaths_forecast_plot = m.plot_components(forecast)
trends of the predicted cases.
Predicting the recovery of the cases¶
recovered.columns = ['ds','y']
recovered['ds'] = pd.to_datetime(recovered['ds'])
m = Prophet(interval_width=0.95)
m.fit(recovered)
future = m.make_future_dataframe(periods=30)
future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
recovered_forecast_plot = m.plot(forecast)
The blue datapoints are the original values and the blue line represents the forecast values.
recovered_forecast_plot = m.plot_components(forecast)
From the above observations we can clearly state that the predictions of cases ,recovery, and death rates for the next 30 days seems to be rising at the same rate as it was before.
So please maintain Social distancing and always wear mask and wash your hands regularly.