InfinityCodeX


Check out our blogs where we cover topics such as Python, Data Science, Machine Learning, Deep Learning. A Best place to start your AI career for beginner, intermediate peoples.

Logistic Regression in Machine Learning



Q.1) What is Logistic Regression?


Logistic Regression is the statistical classification model which deals with categorical  dependent variables or we can say it deals with the data which give output in a binary format like True/False, 0/1, Yes/No, etc. For example, is the patient suffering from coronavirus or not?

Logistic Regression is used to describe data in binary format & to explain the relationship between the dependent variable & multiple independent variables. These input variables or we can say multiple independent variables can be discrete, continuous, nominal, ordinal, interval or ratio-level variables.

Logistic Regression is a predictive analysis. Which means it will predict the output based on the given input data set.

Q.2) Why is Logistic Regression called Logistic Regression?


Let’s break the term Logistic Regression into 2 parts :

1.     Logistic
2.     Regression

Now let us understand both the term one by one :

1. Logistic:     What is Logistic?


The term Logistic means the detailed organization and implementation of complex operations.

Logistic it is also defined as the Art and Science of obtaining, producing & distributing material & product in a proper place & in a proper quantity.

2. Regression:   What is Regression?


Regression Analysis is a predictive modeling technique. It estimates the relationship between a dependent variable we can also call it as target & an independent variable which is also known as a predictor.

Now as we know the meaning of both the terms Logistic & Regression. We can combine them & relate it together The detailed organization & implementation of the complex operations in a predictive modeling technique which estimates the relationship between independent & dependent variables & gives output in a binary format is known as Logistic Regression.

Q.3) What is the purpose of Logistic Regression?


Logistic Regression produces results in a binary format that is used to predict the outcome of categorical dependent variables. So the outcome should be discrete/categorical in nature such as :

Logistic Regression Categorical


Logistic Regression can be used as the tool of applied statistic & discrete data analysis.
The outcome which is given by Logistic Regression is in terms of probability.
Logistic Regression helps in classifying the given data.

Q.4) Why do we use Logistic Regression instead of Linear Regression?


In Linear Regression the value of the dependent variable or we can say that the value which we want to predict is in a range.



But in Logistic Regression there can be only 2 values either 0 i.e False or 1 i.e True.




So in the Logistic Regression, we don't need the value below 0 & above 1. So we will tilt the line which is going beyond the range and make it constant at 0 or 1.



Now once it gets formed into the equation it looks like this :



This is nothing but the ‘S’ curve or Sigmoid function or Sigmoid function curve. This Sigmoid function converts any value from -∞ to ∞ your discrete values which are either 0 or 1.

Now let’s say that I got a value which is at location 0.65 now where do I put it. So at this point, the concept of Threshold Value comes which basically divides the graph in 2 half.  


The Threshold Value basically indicates the probability of winning i.e 1 or losing i.e 0.

To check where my given value is 1 or 0 just compare it with Threshold Value if the value is greater than the Threshold Value then it's 1 & if it is less than the Threshold Value than its false.

To create this curve we need to make an equation.

So now let us see how an equation is formed to imitate this functionality 

NOTE: The Logistic Regression Equation is Derived from straight line.

Equation of a straight line :

y = m1*x1+m2*x2+...+mn*xn+c          Range : -∞ to ∞ 

Let’s try to reduce the Logistic Regression Equation from straight-line equation :

y =  m1*x1+m2*x2+...+mn*xn+c          Range : 0 to 1 in Logistic Regression Equation

Now, to get the range of y between 0 & ∞, let’s transform y :

y/(1-y)  } y = 0 then 0 & y = 1 when 1    Range : 0 to ∞

Let us transform it further, to get range between -∞ to ∞

log[y/(y-1)] => y = m1*x1+m2*x2+...+mn*xn+c  } final Logistic Regression Equation   

Don’t worry guys you don't have to memorize these formulas this all math is packed into the Sklearn library.

Q.5) What is the Difference between Linear Regression and Logistic Regression?

Linear Regression vs Logistic Regression :







Q.6) What are the Examples of Logistic Regression?


Here are a few of the real-life examples which I am mentioning below & we will do practical’s of  2 of them.

1.     Is the mail spam or not?
2.    Will it rain today or not?
3.    Did the passenger survive in Titanic or not?             
4.    Will that person purchase that apartment or not?       
5.    Do you have coronavirus or not?

Q.7) Logistic Regression in Python :


Here are the 2 examples which we will be doing:

NOTE: There are 5 steps while solving Logistic Regression Problem :
(i) import all the libraries :
In this we have to import all the libraries we will be requiring.
(ii) Analyzing the Data :
In this step, we have to analyze the data by applying visual & graphical libraries & functions
(iii) Data Wrangling :
In this step, we modify the data as per the requirement.
(iv) Test & Train the Data :
In this step, we train the data through the fit method & then test it.
(v) Check Accuracy :
In this step, we check the accuracy of our model.

1.) Will that person will purchase that apartment or not?

Q.) A construction company has released a few apartments in the market. Using the previous data about the scales of their apartments, they want to predict the category of people who might be interested in buying this

 

Step 1 : Import all the libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

apt_data=pd.read_csv(r"E:\Git_repo\Logistic_House_data.csv")
apt_data.head()

output :



apt_data.info()

output :

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
Gender       10 non-null object
Age          10 non-null int64
Salary       10 non-null int64
Purchased    10 non-null int64
dtypes: int64(3), object(1)
memory usage: 448.0+ bytes

Step 2 : Analyzing the Data

 

#How many of the people has bought the house
plt.figure(figsize=(7,7))
sns.countplot(x="Purchased",data=apt_data)

output :



plt.figure(figsize=(7,7))
sns.countplot(x="Purchased",hue="Gender",data=apt_data)

output :



Step 3 : Data Wrangling (Not required)

 

# Create a dummy variable for male & female
dummies_gen=pd.get_dummies(apt_data["Gender"],drop_first=True)
dummies_gen

output :



apt_data=pd.concat([apt_data,dummies_gen],axis=1)
apt_data

output :



apt_data=apt_data.drop("Gender",axis=1)
apt_data

output :



Step 4 : Test & Train the Data

X=apt_data.drop("Purchased",axis=1)
y=apt_data["Purchased"]
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)

from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(X,y)

output :

LogisticRegression(C=1.0, class_weight=None, 
dual=False, fit_intercept=True,intercept_scaling=1,
 l1_ratio=None, max_iter=100,multi_class='warn',
 n_jobs=None, penalty='l2',random_state=None,
 solver='warn', tol=0.0001,
 verbose=0,warm_start=False)


model.score(X,y)

output :

0.7

pred=model.predict(X_test)
pred

output :

array([0, 1], dtype=int64)

from sklearn.metrics import classification_report
val=classification_report(y_test,pred)
val

output :




y_test.value_counts()

output :

1    1
0    1
Name: Purchased, dtype: int64

from sklearn import metrics
print(metrics.confusion_matrix(y_test,pred))

output :

[[1 0]
 [0 1]]

print(metrics.recall_score(y_test,pred))

output :

1.0

Step 5 : Check Accurcay

 

from sklearn.metrics import accuracy_score
val2=accuracy_score(y_test,pred)
val2

output :
1.0


Checkout my github :

Q.1) Dataset apartment:

Q.1) Code apartment :


Q.2) Dataset Titanic :
Q.2) Code Titanic :

The second example is of the Titanic survival rate example you have to do on your own if you got any trouble while doing it do check my Code Titanic code which is present above. 

Hey, guys enjoyed my contain?... Please do leave a comment below & do practice the Titanic problem & comment on your answer.

No comments:

No Spamming and No Offensive Language

Powered by Blogger.