Coronavirus Detection using Machine Learning. Data Scientist who will save the world

Disclaimer : This blog is completely hypothetical & our only purpose is to

educate peoples who are interested in DataScience & MachineLearning.

Welcome Knights, you have arrived at the head-quarter of InfinityCodeX.

This organization ought to unleash the true power of your genius…

Welcome aboard….we need your power to save our planet from chaos named “CORONA” also known as “COVID-19”.

By the way have I introduced my self you yet? I am the founder & your today's Captain in this mission InfinityX. You are acknowledged to be the worthy for an elite army of Super-Heros.

That is enough for introduction…We have no time to spare…Hurry & amplify your power of skills to match the destructive catastrophe that we are facing.

These are the portals which will lead you to the amplification of your powers. Remember the key to true power is complete knowledge of both portals. We will assemble once you are ready.

Now, Go Knights…

PORTALS :

PORTAL-α

PORTAL-Ω

Oh! You reported back really quick…Excellent Knights. It seems you are ready to join the battlefield. Before stepping in the war-zone let’s have a look & history of over foe.

This is “CORONAVIRUS”…A formidable opponent which attacked our planet at location Wuhan City of China few months ago. Precisely on December 1^St of year 2019.

And this virus is spreading surprisingly fast…

According to our intel’s the cause of this virus is known to be a Bat. Bats are mammals of the order Chiroptera; with their forelimbs adapted as wings, they are the only mammals capable of true and sustained flight. Bats are more manoeuvrable than birds, flying with their very long spread-out digits covered with a thin membrane or patagium.

Our intel unit received a word from WHO(World Health Organization)…The message was devastating…15000+ Deaths…. Yes you read it right more than 15000 casualties we have to face in this short amount of time.

https://www.worldometers.info/coronavirus/

On 30^th of January 2020, WHO(World Health Organization) declared Red Alert i.e Public Health Emergency on an international level.

Shocked? I think you have understood how devastating this virus is… It’s a critical situation… But there is still a hope fort this world… Our Government, Scientists & Doctors are already on the battlefield & now the time is right to show our enemy the power of Data Scientist of InfinityCodeX. Now fate of our home planet lies in your hand.

Here is the DataSet which is provided by elite scientists from the world :

Corona_Virus_Data

Download the Dataset into your system. Now listen the First mission of Knights named Kill-CORONA is divided into 2 parts :

1.) Identifying the factors which are causing death using power of PORTAL-α.

2.) Generating a Classification model using the power of PORTAL- Ω.

Now come on Knights Do or Die, Failure is not an option…

Listen Knights now I will give you intel about how to execute our mission. So listen care-fully. Remember you hold countless life’s in your hands.

Attention over here, Before diving into the mission’s detail I hope everyone is clear with the knowledge of the portal’s.

Okay then……….

Knights mission 1 part 1 :

Step 1.1 :

At this stage we have to equipped & load all our weapons i.e we have to pre-install all the libraries because in there will be no time for you to reload your weapons while facing enemies.

Step 1.2 :

Call the Dataset which is given by scientist.

Step 1.3 :

DATA WRANGLING … This is the most important process & remember it is not a joke if you screwed anything here then it’s over. Then surely we all gona die.

This Step is further divided into 3 part’s :

1.) Create a dummy variable of important columns

2.) Concatenate that with our data set

3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

If anyone is wetting their pants then you can leave. There is no room for scaredy cat’s.

Step 1.4 :

Step 4 includes multiple steps like :

1.) Separating Independent & Dependent variable.

2.) Calling LinearRegression & creating a model.

3.) Training that model.

4.)Finding Co-efficient & Intercept.

After this tough battle we will reach at the enemies head-quarter.

From there Knights your mission 1 part 2 will begin :

Step 2.1 :

After entering enemies head-quarters we will Analyze our data.

Step 2.2 :

Then we will train our model while reaching at the top floor of there head-quarter & test our model into there system. When you reach at the gate of there system control room call me for at once.

That’s an order.

Step 2.3 :

Then we will check how accurate our model is on there system.

Now Go…Go…Go…Go…

Data Scientist Who will Save The World | Coding begins

Knights mission 1 part 1 : ( Practicals )

MULTI-VARIABLE LINEAR REGRESSION

#Call the import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#Call the Data

import pandas as pd
c_df=pd.read_csv(r"E:\New folder\Corona_unofficial.csv")

c_df.head()

output :

#showing all the information

c_df.info()

output :

#Counting number of null values they Include

c_df.isnull().sum()

output :

Data Wrangling

If the value of the string is important, we can't use it directly. To use that we have to create dummy variable of that column by get_dummies method

We will do 3 steps to use that data in our advantage

1.) Create a dummy variable of important columns

2.) Concatenate that with our data set

3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

# 1.) Create a dummy variable of important columns

dummies_loc=pd.get_dummies(c_df.location)
dummies_loc

output :

# 2.) Concatenate that with our data set

corona_data=pd.concat([c_df,dummies_loc],axis=1)
corona_data

output :

# 3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

corona_data=corona_data.drop(['location','Zhuhai'],axis=1)
corona_data

output :

# 1.) Create a dummy variable of important columns

dummies_country=pd.get_dummies(c_df.country)
dummies_country

output :

# 2.) Concatenate that with our data set

corona_data=pd.concat([corona_data,dummies_country],axis=1)
corona_data

output :

# 3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

corona_data=corona_data.drop(['country','Vietnam'],axis=1)
corona_data

output :

# 1.) Create a dummy variable of important columns

dummies_gender=pd.get_dummies(c_df.gender)
dummies_gender

output :

# 2.) Concatenate that with our data set

corona_data=pd.concat([corona_data,dummies_gender],axis=1)
corona_data

output :

# 3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

corona_data=corona_data.drop(['gender','Not-specify'],axis=1)
corona_data

output :

# Replacing index with reporting date

corona_data=corona_data.set_index("reporting date")
corona_data

output :

# Deleting unwanted columns

corona_data=corona_data.drop(['hosp_visit_date','visiting Wuhan','from Wuhan'],axis=1)
corona_data

output :

# Separating Independent & Dependent variable

X=corona_data.drop(['death'],axis=1)
y=corona_data['death']

# Calling LinearRegression & creating a model

from sklearn.linear_model import LinearRegression
model1=LinearRegression()

#Training that model

model1.fit(X,y)

output :

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)


#This are m in y=m*x+c i.e Co-efficient


model1.coef_


output :
array([-1.78353495e-04,  3.27590733e-04, -6.16209630e-04,  1.86854419e-03,
        6.96720629e-03,  1.39875741e-02,  6.02480354e-02,  8.63267754e-03,
        7.62115033e-03, -4.95165190e-03,  2.91500816e-04,  8.25701819e-03,
        2.03387422e-02, -1.45560264e-02,  5.45053491e+09,  7.48224240e+08,
       -1.93189826e+08, -6.85290102e+07,  7.63533688e+08, -6.82565144e+08,
        7.63533688e+08, -1.87495053e+09,  4.73181305e+08,  1.09167456e+09,
        4.15652371e+09, -6.82565143e+08,  1.09167456e+09, -1.20911670e+08,
        7.64621070e+08, -8.77115118e+08,  7.63533688e+08,  7.63533688e+08,
        7.63533687e+08,  4.73181304e+08, -6.82565143e+08, -6.82565143e+08,
       -6.82565144e+08,  7.48224240e+08, -1.20911669e+08,  2.10716157e+08,
        7.63533688e+08, -1.28024309e+08, -4.06393385e-03,  7.63533688e+08,
        1.09167456e+09, -1.20911670e+08,  7.48224240e+08, -1.20911670e+08,
        7.48224240e+08, -1.20911670e+08, -1.87495053e+09, -1.20911670e+08,
        1.09167455e+09,  7.48224240e+08,  2.20813798e-01, -1.20911670e+08,
       -1.20911669e+08, -1.20911670e+08, -1.20911670e+08,  1.09167456e+09,
        1.83057707e-01,  7.48224240e+08, -4.15330976e-01, -1.20911670e+08,
       -1.20911670e+08,  4.73181304e+08, -1.20911669e+08, -3.63477526e+08,
        7.48224240e+08,  5.66215434e+08,  7.48224240e+08, -1.20911669e+08,
       -1.20911670e+08, -1.20911670e+08, -5.93570509e+08, -1.65880556e+08,
        7.48224240e+08,  2.48698905e+08, -1.93189825e+08, -4.73236592e-01,
        7.48224241e+08,  7.48224240e+08,  2.24578712e+09, -4.93237954e-01,
       -5.44389716e-01,  7.48224240e+08, -5.93570508e+08, -2.09357213e+08,
       -5.49547717e+08, -1.20911670e+08,  7.63533688e+08, -7.89237140e+08,
        7.63533688e+08, -1.20911670e+08, -6.82565143e+08, -5.93570508e+08,
       -6.82565144e+08,  4.64971302e+08,  4.73181305e+08,  7.48224241e+08,
        7.63533688e+08, -8.39851439e+08,  7.48224240e+08,  7.48224240e+08,
        7.63533688e+08,  7.48224240e+08, -9.15541108e-01,  7.63533687e+08,
       -1.20911670e+08, -7.89237140e+08,  1.09167456e+09,  7.48224240e+08,
        7.48224240e+08,  7.63533688e+08,  4.64971303e+08,  4.62723539e+07,
       -1.87495053e+09, -8.39851439e+08,  1.09167456e+09, -2.16213728e+09,
        7.48224240e+08,  7.63533688e+08,  7.48224240e+08,  7.48224240e+08,
       -4.63690059e+08, -1.20911670e+08, -1.20911670e+08, -1.20911670e+08,
       -1.20911670e+08, -1.20911669e+08, -1.20911670e+08, -1.20911670e+08,
       -2.06237650e+08, -8.39851438e+08, -4.63690059e+08, -2.75039763e+08,
        7.63533687e+08, -5.41276745e+09, -1.87495053e+09, -6.82565143e+08,
        4.73181304e+08,  3.95286999e+09,  7.17725877e-01, -1.20911670e+08,
        7.48224240e+08,  1.45133633e+09, -9.70032866e-01,  1.09167456e+09,
        7.08586818e+08, -4.12751959e+09, -7.89237141e+08, -6.82565144e+08,
        1.45133633e+09, -8.39851439e+08,  5.35473978e-01,  7.48224240e+08,
       -7.89237140e+08,  5.81083871e-02,  4.73181304e+08,  4.73181305e+08,
       -1.20911670e+08, -1.20911670e+08, -1.05418910e+00, -7.89237140e+08,
       -1.20911670e+08,  7.08586818e+08, -6.82565143e+08, -1.20911669e+08,
       -5.45053491e+09,  6.85290110e+07,  8.39851439e+08,  3.63477527e+08,
       -4.15652371e+09, -7.64621069e+08, -4.62723532e+07, -1.45133633e+09,
        1.20911670e+08, -2.10716157e+08,  1.28024309e+08,  2.09357214e+08,
       -7.63533688e+08, -1.09167456e+09,  6.76132770e-01,  1.93189826e+08,
        1.87495053e+09, -5.66215434e+08,  2.16213728e+09, -7.48224240e+08,
       -2.24578712e+09,  5.49547718e+08,  5.93570509e+08, -2.48698905e+08,
       -4.64971303e+08, -7.08586818e+08,  2.06237650e+08,  4.63690059e+08,
        6.82565144e+08,  2.75039764e+08,  1.65880556e+08,  8.77115118e+08,
        5.41276745e+09, -3.95286999e+09,  4.12751959e+09,  7.89237141e+08,
       -4.73181304e+08,  4.35575854e-02, -3.45127022e-03])


# This are c in y=m*x+c

model1.intercept_

output :

0.4251844306557958
By inserting values of x i.e independent-variable we will get the output that the person will die or not ( 0 = Not-Die , 1 = Die) but there are multiple values so i am skipping that step


LOGISTIC REGRESSION


corona_data.info()



output :

<class 'pandas.core.frame.DataFrame'>

Index: 1085 entries, 20-01-2020 to 25-02-2020
Columns: 208 entries, Unnamed: 0 to male
dtypes: int64(15), uint8(193)
memory usage: 340.1+ KB


Analyzing our Data


# Comparing number of deaths 0 = Not-dead & 1 = Dead



plt.figure(figsize=(8,7))




sns.countplot(data=corona_data,x="death")



output :








plt.figure(figsize=(7,7))




sns.countplot(x="death",hue="male",data=corona_data)



output :









plt.figure(figsize=(7,7))



sns.countplot(x="death",hue="female",data=corona_data)


output :










Test & Train our model


# Training our model

from sklearn.model_selection import train_test_split



X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)




#creating a LogisticRegression model & training it with fit


from sklearn.linear_model import LogisticRegression


model=LogisticRegression()


model.fit(X,y)

output :

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)
#model Score

model.score(X,y)


output :


0.647926267281106


# What our model predicted

pred=model.predict(X_test)

pred


output :


array([0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,


       0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1,


       1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1,


       0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1,


       1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0,


       1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1,


       1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1,


       0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1,


       1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1,


       0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1],  dtype=int64)



from sklearn.metrics import classification_report




val=classification_report(y_test,pred)




val


output :





# Value counts of our training set



y_test.value_counts()

output :



1    119
0     98
Name: death, dtype: int64
# This is confusion matrix. I'll explain this in our next blog post

from sklearn import metrics
print(metrics.confusion_matrix(y_test,pred))

output :

[[63 35]

[35 84]]

print(metrics.recall_score(y_test,pred))


output :

0.7058823529411765


Check Accuracy



# Testing Accuracy of our model

from sklearn.metrics import accuracy_score


val2=accuracy_score(y_test,pred)




val2


output :

0.6774193548387096

Hey guys...Here is link of my code :
https://github.com/Vegadhardik7/Git_Prac_Repo/blob/master/CORONA_PROJECT_FINAL.ipynb


Hey guy! did you enjoyed our content?....Can we improve more?....Tell us in comment section & please share this with your friends & family.

InfinityCodeX

Coronavirus Detection using Machine Learning. Data Scientist who will save the world

Data Scientist Who will Save The World | Coding begins

MULTI-VARIABLE LINEAR REGRESSION

Data Wrangling

If the value of the string is important, we can't use it directly. To use that we have to create dummy variable of that column by get_dummies method

We will do 3 steps to use that data in our advantage

1.) Create a dummy variable of important columns

2.) Concatenate that with our data set

3.) Then drop that column from which we created dummy variables & the last column of that dummy variable columns to avoid dummy variable trap

You May Also Like

2 comments:

Subscribe

Categories

Blog Archive

Recent Posts

Pages

Random Posts

Tags

Popular Posts